Sunday, August 31, 2008

OSCON 2008 summary (belated)

I attended OSCON 2008 in Portland, Oregon in July.

It's a good opportunity to see what the Open Source community is up to, and what interesting
technologies capture the attention. There were a lot of presentations, many of them concurrent, fortunately many of them have also  made it online.

I took a few pictures while I was there, you can find them at6 my flickr oscon set.
(Also, they had a photo contest from prior years - and one of my balloon photos from 2007 won first place!)

General themes I found most interesting, and which had a lot of other interest:
  • large data sets and their implementations (Hadoop, Bigdata)
  • scaling and performance (Facebook, Flickr, and others)
  • propagating data using Jabber's XMPP protocol (aka XMPP Pub Sub)
  • OAuth (open authentication protocol for services)
  • Microblogging (identi.ca)
  • dynamic languages (Groovy, JRuby,Ruby,Erlang)
  • web frameworks (Django, RoR, etc.)
  • virtualization (I attended only a couple of these, lots of interest though)
Of the sessions I attended, here are a few highlights.

****
Open source applications making inroads into mainstream usage:
Django, Alfresco, Zimbra.

****
The XMPP Pub Sub idea got a lot of interest. It was inspired by the massive crawling
that FriendFeed was performing on Flickr, looking for images updated. This idea will enable a listener to subscribe to content changes, reusing the Jabber XMPP IM protocol (containing an Atom payload.)

****
Groovy vs JRuby discussed when you might prefer one language over the other.
Summary was that:
  • Groovy had a better fit with Java, good (and improving) performance and is best for tight and/or heavy Java code integration. Also there is a Groovy compiler that will generate .JAR files, useful for "stealth" integration.
  • JRuby code integration with Java good but still needs work. Best for general scripting and light code integration. But you have both the JDK and Ruby libraries to work with.
Note also that Sun has dedicated resources for JRuby (and Jython)

****
Facebook developer's mentioned they had >400 memcached hosts, using multi-retrieval code
that they've written (and shared.)

Also some discussion on using TCP vs UDP, and about high-latency problems caused by East/West coast server traffic.

****

Web Frameworks

There were several discussions centered on frameworks.  One new term was SOFEA. Idea consists mainly of moving more functionality into the client, rather then relying upon server-side framework to perform everything.  (Basically this is what's happening with the RIA model, such as Dojo, Laszlo, and Flex enable.)

Some quick summary judgements on different frameworks:
  • OpenLaszlo: pummeled by Adobe
  • JavaFX: no one using it (yet at least)
  • Rails: ActiveScaffold adds REST and Ajax, Google Trends shows peak in 2005.
  • Grails: scaffolding not very good yet, but has better performance than Rails
  • Flex: Flex + Rails an interesting platform, doesn't yet support all HTML well.

Performance notes:
LinkedIn has a Rails-based Facebook application that supports 1M requests/month - named "Bumper Sticker".

On a comparison scale, in turns of ms/iteration, we see that:
  • Java, C++ : very low, less than 1;
  • JRuby: 100
  • Groovy: 215
  • Python: 225
  • PHP: 600

In a session I missed, but saw notes from later, there were some significant performance improvements in Ruby performance too. They've introduced a compiler that generates code for the LLVM machine, and it's much faster than the C-based "Matz" interpreter. Name of this project is Rubinius.

****

Mozilla developers discussed ways to implement static analysis of C++ programs.
They have developed a plug-in for GCC that let's them run JavaScript in the compiler, giving
them access to the program graphs directly. This is "Dehydra" project.

They're already using this technique to refactor some of their existing code - and they're
actually converting it into JavaScript - more on this in a minute. Now that they have access to the program graph, they're looking for other things they can do to the code too: security analysis, bug analysis, standards enforcement, etc.

About the JavaScript conversion: they're using Trace Compilation in the interpreter that takes
care of performance bottlenecks. Doesn't fix one-time execution though. Still, they claim that
much of their code is easily translated into JS, and its just as fast as the C++ and safer. Target language is JS 2.0 which allows for class/struct within traditional JS objects, making
them safe for C access.

****

The Open Microblogging discussions were interesting. 'identi.ca' is an open source service using Twitter api, that's federated (supports multiple servers). Project code is 'laconi.ca'. Currently uses HTTP between servers, but they know this won't scale. They intend to use the XMPP Pub Sub idea. There's a lot of interest in this technology, and some good ideas.

****

Mark Shuttleworth from Canonical (Ubuntu's business org) gave an interesting talk about their development practices. Its a mix of lean and agile techniques. They really try to amplify learning and not specialization. "Decide late, deliver early" was mentioned. But then later he talked about how knowledge and expertise was more important than colocation, so obviously some level of specialization occurs.

Some of the other practices mentioned:
  • cadence/cycle
  • track bugs, features, ideas
  • branch/merge, keeps cadence trunk pristine, merging important.
  • code review, but they stay away from voting, too divisive
  • automated tests: unit, integration, utilization, full app, profile usage.
  • pre-commit testing, with trunk locked to everyone but a robot that runs test before commit.
****
There was a good discussion on Python & C++ integration using the SWIG modules.
SWIG is now based on a full compiler that reads C/C++ declarations and generates C extensions that allow Python access. Python code can even extend and override C++ classes.
One might have to cleanup the headers a little, but often it works just fine. While its not the best generated code,it works and cuts out most of the work that would otherwise need to be done by hand.

As an example, they put wxPython on it. 6M LOC, 90-95% generated by SWIG, enabling one person to do most of the maintenance on it. (I experimented with SWIG 2-3 years ago and wasn't encouraged, looks like its improved!)

****
A different discussion from an engineer at Meebo, describing how they implemented a hiring process, and the trials & tribulations they went through trying to "staff up".  One unusual practice they instituted was a "simulation" where did a 4-hour exercise reflecting "everyday" tasks. (Can't have candidates do real work though.)

****
There was the big announcement that Microsoft was becoming a Platinum member of the Apache Software Foundation, as well as assurances that they are continuing to evaluate and license "open source" technology. PFIF/Samba agreement, PHP support in Win2008 (ADODB), Ruby libs were all mentioned.

****