NoSQL standouts: New databases for new applications
Cassandra, CouchDB, MongoDB, Redis, Riak, Neo4J, and FlockDB reinvent the data storeFollow @peterwayner
The project is shepherded by Basho, a company that provides both open source and enterprise versions of Riak. The open source version appears quite feature-rich. The main differences in the enterprise version are a slicker Web-based administration tool and the availability of high-speed, internode communication across data centers. And only the enterprise version can use SNMP.
NoSQL databases: Neo4J
If there's one application that's most different in this collection, it's Neo4J, a tool optimized to store graphs instead of data. The Neo4J folks use the word "graph" like a computer scientist to mean a network of nodes and connections. Neo4J lets you fill up the data store with nodes and then add links between the nodes that mean things. Social networking applications are its strength.
The code base comes with a number of common graph algorithms already implemented. If you want to find the shortest path between two people -- which you might for a site like LinkedIn -- then the algorithms are waiting for you.
Neo4J is pretty new, and the developers are still uncovering better algorithms. In one recent version, they bragged about a new caching strategy: searching algorithms will run much faster because Neo4J is now caching the node information.
They've also added a new query language with pattern matching that looks a bit like XSL. You can search a graph until you identify nodes with the right type of data. It is a new syntax to learn.
The Neo4J project is backed by Neo Technology, which offers commercial versions of the database with more sophisticated monitoring, fail-over, and backup features.
NoSQL databases: FlockDB
If someone out there is writing code, someone else out there is complaining that the code is too complicated. It should be no surprise that some people think Neo4J is too intricate and sophisticated for what needs to be done. We know that Neo4J has truly arrived because the FlockDB fans are clucking about how FlockDB is simpler and faster.
FlockDB is a core part of the Twitter infrastructure. It was released by Twitter more than a year ago as an open source project under the Apache license. If you want to build your own Twitter, you can also download Gizzard, a tool for sharding data across multiple instances of Flock. Both tools are ready and waiting to run in a JVM.
Although many of us would call FlockDB a graph database because it stores relationships between nodes, some think that the term should apply only to sophisticated tools like Neo4J. Did someone start following someone else? Well, you can link up Flock's nodes with data such as the time that the relationship began. That part is like Neo4J. Where Flock differs is how deeply you can query this data. FlockDB takes a pair of nodes and gives you the data about the connection. Everything else is up to you. Neo4J not only enables all types of graph-walking algorithms, but it provides them as services. FlockDB uses the word "non-goal" for these multihop queries, meaning that the developers have no interest in supporting them.
The code is pretty new, and it doesn't seem to be attracting the same kind of widespread attention as some of the other projects. All of the recent commits have come from Twitter employees, and I wasn't able to find anyone offering FlockDB hosting as a service. FlockDB still seems to be mainly a Twitter project.