NoSQL standouts: New databases for new applications
Cassandra, CouchDB, MongoDB, Redis, Riak, Neo4J, and FlockDB reinvent the data storeFollow @peterwayner
The downsides to this experimentation are often ignored in all of the excitement. In the past, developers built nice cross-database libraries to smooth the differences and make it easier to switch. Many Java developers, for instance, write code that rests on the JDBC libraries. The databases are pretty close to interchangeable. None of these old libraries work with these new databases that thumb their nose at the old orthodoxies. Although many of the projects share similar approaches, moving from one to another can require more than a few lines of rewriting. (This may be changing because there's at least one project aiming to create Hibernate-bindings for the major NoSQL databases. Hibernate OGM could make it possible to point your Hibernate-based application at any of the NoSQL databases it's able to support.)
To make matters worse, many ancillary items are missing. There's a very fertile category of report-generating tools that work with any database out there, a fact that was made possible by the boring, old standard query language. None of the new databases will work with these tools out of the box, and they may never work without plenty of sweat. There may be hundreds or perhaps thousands of packages that work with SQL and only a few that are able to help NoSQL packages.
There are some indications that this kind of interchangeability will take a long time to emerge in the NoSQL space, if it appears at all. All of that experimentation is generating features that don't overlap very easily. Although it wouldn't be hard to write basic routines that abstract away the keys and values, anything more sophisticated would start to fray quickly. The query languages, for instance, are very different.
To get some feel for the options in this incredibly fertile space, I spent some time installing these packages and inserting some rows. It was, for the most part, simple, fun, and relatively fast. All of the packages are relatively stable and useful -- for the right projects. But none of them are as feature-rich or sophisticated as the best commercial SQL tools.
NoSQL databases: Cassandra
Facebook needed something fast and cheap to handle the billions of status updates, so it started this project and eventually moved it to Apache where it's found plenty of support in many communities. It's not just for Facebook any longer. Many of the committing programmers come from other companies, and the project chair works at DataStax.com, a company devoted to providing commercial support for Cassandra.
The heritage of the Cassandra project is obvious because it's a good tool for tracking lots of data, such as status updates at Facebook. The tool helps create a network of computers that all carry the same data. Each machine is meant to be equal to the others, and all of them should end up being consistent once the data propagates around the P2P network of nodes, though it's not guaranteed. The key phrase is "eventual consistency," not "perfect consistency." If you've watched your status updates disappear and reappear on Facebook, you'll understand what this means.
The tool runs in Java as a separate process waiting for interaction. There's already a collection of higher-level libraries for Java, Python, Ruby, and PHP, as well as some of the other languages.
Using Cassandra seems relatively simple, but I still found myself getting hung up on several barriers, such as defining a keyspace (which acts as a namespace but for the columns). Getting up to speed takes more than a few minutes because there are more than just the basic routines for storing collections of values. Cassandra is happy with a sparse matrix where each row stores only a few standard columns, and it builds the indices with this in mind.