Monday, November 17, 2008

JSON support in native code.


I recently needed to look at JSON support libraries for a project I'm working on. It was required that it be a native C or C++ implementation. Some Boost libraries were available. This is a brief report on what I've found so far.

json-c v0.7 (internally its marked as 0.3) was the first one I looked at, mainly because I had used before in an Objective-C environment.
Building it on OSX was easy, creating a libjson.dylib that I could link to. The programming interface is basic C, creating 'json_object's, and adding them to other 'json_object's and 'json_object_array's. There is an incremental tokenizer/parser, as well as the ability to convert a JSON object into a standard JSON string. Its simple, seems lightweight, and I know it works since I've used it before.

TINYJSON-1.3.0 is the next one I examined. This one is heavily dependent upon the Boost C++ Template Libraries, also pulling in the Boost Spirit parser, that I'm unfamiliar with. This was my first attempt at using Boost on OSX, so I first had to pull it down, build and install it, not a big deal. The TinyJSON distribution included a basic set of Boost TEST_CASE's which took me awhile to figure out how ot configure in XCode. Next hurdle is the programming model - its all based on C++ templates, which is useful but sometimes hard to fathom.
Main problem with it though is that its only defined for reading JSON data, not writing it. I need writing.

JSONCPP is the last one I've tried so far. This one looks promising, but I've had problems just getting it to build - it uses something called 'scons' to manage its build process. After following all the instructions, I had no luck. I tried pulling the project tree into XCode and building it manually, and was able to make that work at least. Object structure looks promising, although it has the usual c++ verbosity. Worse though, are no examples and I find that all of the test code is written in Python, and doens't work. i've probably not built something correctly, I just don't know what yet.

A couple of other libaries that I haven't had time to examine closely yet: Jost and Jaula, the former is C++, the latter strictly C. A closer look at Jost is required, as I see it is also Boost-based and it seems relatively simple.

After this little exercise, I'll probably just use the libjson support - its simple and it works. If/when I get more time, I'll look into the C++ versions again.
I was surprised at the dearth of C++ -based versions, I suppose that this is because libjson support is good enough.

9/2009: I ended up doing this for another project too, and again ran into problems with the Boost-based options. Since this was a C++ project I ended up using the Cajun library, and find that it works pretty well. Its a bit stream-oriented but that can probably be changed if it gets in the way too much, or becomes a performance bottleneck. I found it at http://cajun-jsonapi.sourceforge.net/

Sunday, August 31, 2008

OSCON 2008 summary (belated)

I attended OSCON 2008 in Portland, Oregon in July.

It's a good opportunity to see what the Open Source community is up to, and what interesting
technologies capture the attention. There were a lot of presentations, many of them concurrent, fortunately many of them have also  made it online.

I took a few pictures while I was there, you can find them at6 my flickr oscon set.
(Also, they had a photo contest from prior years - and one of my balloon photos from 2007 won first place!)

General themes I found most interesting, and which had a lot of other interest:
  • large data sets and their implementations (Hadoop, Bigdata)
  • scaling and performance (Facebook, Flickr, and others)
  • propagating data using Jabber's XMPP protocol (aka XMPP Pub Sub)
  • OAuth (open authentication protocol for services)
  • Microblogging (identi.ca)
  • dynamic languages (Groovy, JRuby,Ruby,Erlang)
  • web frameworks (Django, RoR, etc.)
  • virtualization (I attended only a couple of these, lots of interest though)
Of the sessions I attended, here are a few highlights.

****
Open source applications making inroads into mainstream usage:
Django, Alfresco, Zimbra.

****
The XMPP Pub Sub idea got a lot of interest. It was inspired by the massive crawling
that FriendFeed was performing on Flickr, looking for images updated. This idea will enable a listener to subscribe to content changes, reusing the Jabber XMPP IM protocol (containing an Atom payload.)

****
Groovy vs JRuby discussed when you might prefer one language over the other.
Summary was that:
  • Groovy had a better fit with Java, good (and improving) performance and is best for tight and/or heavy Java code integration. Also there is a Groovy compiler that will generate .JAR files, useful for "stealth" integration.
  • JRuby code integration with Java good but still needs work. Best for general scripting and light code integration. But you have both the JDK and Ruby libraries to work with.
Note also that Sun has dedicated resources for JRuby (and Jython)

****
Facebook developer's mentioned they had >400 memcached hosts, using multi-retrieval code
that they've written (and shared.)

Also some discussion on using TCP vs UDP, and about high-latency problems caused by East/West coast server traffic.

****

Web Frameworks

There were several discussions centered on frameworks.  One new term was SOFEA. Idea consists mainly of moving more functionality into the client, rather then relying upon server-side framework to perform everything.  (Basically this is what's happening with the RIA model, such as Dojo, Laszlo, and Flex enable.)

Some quick summary judgements on different frameworks:
  • OpenLaszlo: pummeled by Adobe
  • JavaFX: no one using it (yet at least)
  • Rails: ActiveScaffold adds REST and Ajax, Google Trends shows peak in 2005.
  • Grails: scaffolding not very good yet, but has better performance than Rails
  • Flex: Flex + Rails an interesting platform, doesn't yet support all HTML well.

Performance notes:
LinkedIn has a Rails-based Facebook application that supports 1M requests/month - named "Bumper Sticker".

On a comparison scale, in turns of ms/iteration, we see that:
  • Java, C++ : very low, less than 1;
  • JRuby: 100
  • Groovy: 215
  • Python: 225
  • PHP: 600

In a session I missed, but saw notes from later, there were some significant performance improvements in Ruby performance too. They've introduced a compiler that generates code for the LLVM machine, and it's much faster than the C-based "Matz" interpreter. Name of this project is Rubinius.

****

Mozilla developers discussed ways to implement static analysis of C++ programs.
They have developed a plug-in for GCC that let's them run JavaScript in the compiler, giving
them access to the program graphs directly. This is "Dehydra" project.

They're already using this technique to refactor some of their existing code - and they're
actually converting it into JavaScript - more on this in a minute. Now that they have access to the program graph, they're looking for other things they can do to the code too: security analysis, bug analysis, standards enforcement, etc.

About the JavaScript conversion: they're using Trace Compilation in the interpreter that takes
care of performance bottlenecks. Doesn't fix one-time execution though. Still, they claim that
much of their code is easily translated into JS, and its just as fast as the C++ and safer. Target language is JS 2.0 which allows for class/struct within traditional JS objects, making
them safe for C access.

****

The Open Microblogging discussions were interesting. 'identi.ca' is an open source service using Twitter api, that's federated (supports multiple servers). Project code is 'laconi.ca'. Currently uses HTTP between servers, but they know this won't scale. They intend to use the XMPP Pub Sub idea. There's a lot of interest in this technology, and some good ideas.

****

Mark Shuttleworth from Canonical (Ubuntu's business org) gave an interesting talk about their development practices. Its a mix of lean and agile techniques. They really try to amplify learning and not specialization. "Decide late, deliver early" was mentioned. But then later he talked about how knowledge and expertise was more important than colocation, so obviously some level of specialization occurs.

Some of the other practices mentioned:
  • cadence/cycle
  • track bugs, features, ideas
  • branch/merge, keeps cadence trunk pristine, merging important.
  • code review, but they stay away from voting, too divisive
  • automated tests: unit, integration, utilization, full app, profile usage.
  • pre-commit testing, with trunk locked to everyone but a robot that runs test before commit.
****
There was a good discussion on Python & C++ integration using the SWIG modules.
SWIG is now based on a full compiler that reads C/C++ declarations and generates C extensions that allow Python access. Python code can even extend and override C++ classes.
One might have to cleanup the headers a little, but often it works just fine. While its not the best generated code,it works and cuts out most of the work that would otherwise need to be done by hand.

As an example, they put wxPython on it. 6M LOC, 90-95% generated by SWIG, enabling one person to do most of the maintenance on it. (I experimented with SWIG 2-3 years ago and wasn't encouraged, looks like its improved!)

****
A different discussion from an engineer at Meebo, describing how they implemented a hiring process, and the trials & tribulations they went through trying to "staff up".  One unusual practice they instituted was a "simulation" where did a 4-hour exercise reflecting "everyday" tasks. (Can't have candidates do real work though.)

****
There was the big announcement that Microsoft was becoming a Platinum member of the Apache Software Foundation, as well as assurances that they are continuing to evaluate and license "open source" technology. PFIF/Samba agreement, PHP support in Win2008 (ADODB), Ruby libs were all mentioned.

****