My latest project at work is Cassandra, a distributed, eventually consistent, column oriented data store. It’s somewhere between Dynamo (Cassandra’s original author worked on Dynamo), and Google’s BigTable. It was developed as an internal application at Facebook, later open sourced, and is now an Apache incubator project.
The external interface to Cassandra is thrift-based. Thrift is a framework for creating network services, services that communicate using a compact binary data format. It’s similar to Google’s Protocol Buffers, but with more of a focus on RPC, and greater language coverage, (much greater actually). The bottom line, any application that uses Cassandra for structured data storage is going to need Thrift. So, I filed an ITP (Intent To Package) and have started work on packaging it for Debian.
Thrift is an interesting project to package as it has an architecture specific application (C++), 6 architecture specific and 5 architecture independent libraries, and covers 12 different languages. That’s right, 12.
I’m still somewhat undecided on a game plan; the options I’ve considered so far are:
- Convince upstream to split their source tree and distribute all of these libraries separately, allowing them to be packaged by people with the skills and/or motivation for each.
- Split the source myself and package the bits that are most important to me.
- One source package based on the official upstream release, with binary packages for each of the components that I need/am comfortable maintaining. Folks interested in the parts not packaged could step up to the plate and contribute their time.
- Best-effort packaging of most/all of the libraries upstream ships with the proviso that for any I’m not comfortable seeing in a release, and for which no one has stepped forward for, they would be removed prior to Squeeze.
I’ve already taken a stab at #1 and it didn’t seem promising. #2 is an option I still consider on the table but I’m a little concerned that it could lead to a mess. #3 and #4 really boil down to the same thing, collaborating with others to package as much as possible while maintaining the standards everyone expects from Debian. I guess I’m currently leaning toward some variation of #3 or #4, probably through the use of collab-maint or a dedicated Alioth project.
For the time being, my efforts can be tracked in Git here, so drop me a line if you’re interested in joining the fun!
NP: Sand and Mercury, The Gathering