Sunday, August 1, 2010

OSCON 2010 report

OSCON 2010 Summary

There were several overall themes to the sessions I attended at OSCON this year, summarized below. The conference had a good feel to it, although I found myself routinely conflicted over which session to attend (and didn't always make the right choice.) Too many interesting topics is better than too few though.

Scala Summit and sessions
(a recap of my earlier post on this)
There was a day spent talking about the Scala language, environment and experience using it, there were also a few sessions that covered it. Overall they were quite interesting and renewed my interest in exploring Scala again. (But with caveats based upon my earlier experience that I've heard echoed repeatedly: Scala has complex subtleties, perhaps akin to the metaprogramming models expressed in C++! It is interesting, and appears to be a good fit for some projects, but its not simple.)

For me the big takeaways were:
  • use the Akka concurrency framework to address standard library problems with actors. Supports event-based and thread-based models, very lightweight and fast. (http://akkasource.org)
  • use the Simple Build Tool (sbt), it's the best way to build and manage Scala projects. It uses Ivy for dependency management, builds Maven-compatible hierarchy. Much easier than integrating with existing Eclipse and Maven projects.
  • Android: fairly complicated toolchain, but Scala is usable for Android development;
  • Don't bring Ruby on Rails conventions and patterns into Scala, it's just not the same;
  • Expect problems hiring (and training the reluctant) developers;

Big Data and the Processing of it (aka NoSQL - Not Only SQL)
  • Good talk about patterns of database scalability (both RDBMS and NoSQL). One comment that stuck with me was that sharding is not always a good option - sometimes its better to divide operations and duplicate the data;
  • MongoDB was the subject of much discussion (and repeated Twitter jokes) - mainly because its implemented using virtual memory and people don't trust it, so that redundancy is required. Its an object store, with none of the other overhead of an RDBMS. No Joins and no complex transactions == horizontal scalability. SourceForge is using it now.
  • Pig seems to hold a lot of promise: its a high-level data flow language that compiles down into map-reduce jobs. (aside: when the speaker showed his slide of all the Java code a few lines of Pig replaced - a baby in the audience started crying. Twitter erupted.) Estimate that it requires 5% of the code required by Java, similar reduction in developer time, and its within 25% of the execution time of a hand-coded map-reduce job now.
  • Scribe, the logging system used internally at FaceBook, handles 130TB per day. Talked a little about the architecture variations and how popular it was internally;
  • Mahout - an overview of the open-source machine learning library, talked a bit about scaling it on Hadoop and some of the different algorithms it currently supports.

Client-side Development
  • Phonegap promises a cross-platform mobile development kit. Its for writing web applications that have access to the native device, getting there using a native wrapper that hosts a web browser that your web application runs in. The web application uses phonegap.js to access the native environment. Still UI issues between platforms, debugging is a pain. Speaker also favored qUnit, XUI instead of jQuery, and Lawnchair for client-side persistence.
  • Android talks were sprinkled throughout, but the big event was an extra workshop that evening that required a separate signup. The 300+ developers that attended were rewarded with in-depth tutorials on Android UI design, asynchronous programming technique, and a Nexus One device to test it all out on.
  • JavaScript, many sessions and a recurring theme throughout the conference. At one point, I heard the phrase "JavaScript is the language of the web, deal with it!"

Server-side Topics
  • Tomcat security covered many of the vulnerabilities in TC6, which were fixed in TC7. XSS attacks are growing, and one has to keep adding filters for them. TC7 allows regular expression filters to help with this.
  • Django was covered in a few sessions, I learned a little more about its deployment, which is pretty typical when you try to scale it up.
  • Spring 3 Framework: I missed some of this talk, but had a chance to talk with the presenter afterwards, and review the slides. The most interesting parts were their growing support for Dojo, continued support for Flex, and their new Roo tool - allowing one to quickly create and scaffold a Spring-compatible server, much as does Rails. No more copy and prune! At least for development sites.
  • Chef is an intriguing system for deploying applications and systems software to various systems in a large network. Looks a lot nicer than the one commercial offering I've had experience with. This might be a good way to spin up development servers quickly as well. Supports a variety of systems.

Miscellaneous Topics

  • Mirah language was introduced by Charles Nutter. Basically its a Ruby-like language with a twist: it compiles directly into Java bytecode or source, and does not require a runtime library. Basically its (mostly) Ruby with static typing.
  • Concurrency topics continued, with a presentation by Tim Bray capturing much of the essence. Basically its the functional vs procedural programming, event-based vs thread-based arguments. Functional and event-based is gaining interest, and racking up impressive results. Node.js in JavaScript and Event Machine in Ruby are two popular examples. Tim's presentation was very judgemental, although he favored an Erlang model at least (message-passing actors.) Perhaps Go will help there. (I am not yet convinced that event-loops are fundamentally better than thread-based models on modern multicore systems.)
  • Go language, presented by Rob Pike, both in sessions and keynotes, garnered new interest from me. (Actually, I've pretty much ignored it, thinking the last thing the world needed was another C-based language!) Its simple like C, but has promising concurrency constructs.
  • Testing was covered in a couple of sessions. One tool I need to look at is Cucumber. A speaker commented that they though Mock's were useless because they decoupled reality from your tests, then fail and are ignored. Also the need to keep tests fast, otherwise people push them off to the CI servers and they just ignore those messages. Also, think "outside in": if you are too close to the model or implementation you're not really testing functionality.
There were separate tracks on Health Care computing, Cloud and Emerging Languages. Most of these sessions were being recorded, so I skipped them when there was an interesting conflict. I hope to catch up on some of them later.

The keynotes: make sure you watch the videos by Simon Wardley (talking about innovation, process, commodities and and services), and Rob Pike (talking about the reasons why we need a new programming language so as Go.)

1 comment:

rick said...

on the trip from Seattle to Portland, I took a detour through Mount St Helens National Monument. A few pix from it here:
http://www.flickr.com/photos/rgordon/sets/72157624620471790/

on the way back, I detoured through the Gorge, photos forthcoming (I have a backlog of things to do.)