NoSQL standouts: New databases for new applications

Cassandra, CouchDB, MongoDB, Redis, Riak, Neo4J, and FlockDB reinvent the data store

The write operations, for instance, can include a parameter that asks Riak to confirm when the data has been propagated successfully to any number of the machines in the cluster. If you don't want to trust just one machine, you can ask it to wait until 2, 3, or 54 machines have written the data before sending the acknowledgment. This is why the team likes to toss around its slogan: "Eventual consistency is no excuse for losing data."

The data itself is not just written to disk. Well, that is one of the options, but it's not the main one. Riak uses a pluggable storage engine (Bitcask by default) that writes the data to disk in its own internal format. There are several other options, including a version of InnoDB for those who are nostalgic for MySQL. You can get all of the belts and suspenders with the clustering power of Riak.

When it comes time to fetch the data, Riak offers to eliminate any of the errors that might appear. If two nodes end up with different versions of an object, Riak can either choose the youngest update or return both of the objects and leave the decision up to your client code. This is a very useful option for detecting potential errors in the data.

There are a large number of query options. The basic architecture is map and reduce, but there is also the chance to write the functions in either Erlang or JavaScript.

The project is shepherded by Basho, a company that provides both open source and enterprise versions of Riak. The open source version appears quite feature-rich. The main differences in the enterprise version are a slicker Web-based administration tool and the availability of high-speed, internode communication across data centers. And only the enterprise version can use SNMP.

NoSQL databases: Neo4J
If there's one application that's most different in this collection, it's Neo4J, a tool optimized to store graphs instead of data. The Neo4J folks use the word "graph" like a computer scientist to mean a network of nodes and connections. Neo4J lets you fill up the data store with nodes and then add links between the nodes that mean things. Social networking applications are its strength.

The code base comes with a number of common graph algorithms already implemented. If you want to find the shortest path between two people -- which you might for a site like LinkedIn -- then the algorithms are waiting for you.

Neo4J is pretty new, and the developers are still uncovering better algorithms. In one recent version, they bragged about a new caching strategy: searching algorithms will run much faster because Neo4J is now caching the node information.

They've also added a new query language with pattern matching that looks a bit like XSL. You can search a graph until you identify nodes with the right type of data. It is a new syntax to learn.

The Neo4J project is backed by Neo Technology, which offers commercial versions of the database with more sophisticated monitoring, fail-over, and backup features.

NoSQL databases: FlockDB
If someone out there is writing code, someone else out there is complaining that the code is too complicated. It should be no surprise that some people think Neo4J is too intricate and sophisticated for what needs to be done. We know that Neo4J has truly arrived because the FlockDB fans are clucking about how FlockDB is simpler and faster.

FlockDB is a core part of the Twitter infrastructure. It was released by Twitter more than a year ago as an open source project under the Apache license. If you want to build your own Twitter, you can also download Gizzard, a tool for sharding data across multiple instances of Flock. Both tools are ready and waiting to run in a JVM.

Although many of us would call FlockDB a graph database because it stores relationships between nodes, some think that the term should apply only to sophisticated tools like Neo4J. Did someone start following someone else? Well, you can link up Flock's nodes with data such as the time that the relationship began. That part is like Neo4J. Where Flock differs is how deeply you can query this data. FlockDB takes a pair of nodes and gives you the data about the connection. Everything else is up to you. Neo4J not only enables all types of graph-walking algorithms, but it provides them as services. FlockDB uses the word "non-goal" for these multihop queries, meaning that the developers have no interest in supporting them.

The code is pretty new, and it doesn't seem to be attracting the same kind of widespread attention as some of the other projects. All of the recent commits have come from Twitter employees, and I wasn't able to find anyone offering FlockDB hosting as a service. FlockDB still seems to be mainly a Twitter project.

NoSQL databases: How do you choose?
There's no easy answer. Most shops would be happy with any of them, even if they select the worst one for their needs. Choosing the best, though, is a bit harder because a good developer will want to balance the strength of the project, the availability of commercial support, and the quality of the documentation with the quality of the code.

The greatest divergence is in the extras. All of them will store piles of keys with their values, but the real question is how well they split the load across servers and how well they propagate changes across them. Then there's the question of hosting. The idea of a cloud service that will do all of the maintenance for you is seductive.

The stakes are higher because switching is more difficult than it is with the SQL databases. There's no standard query language in this world, nor is there a vast array of abstraction layers like the JDBC. These NoSQL databases have the power to lock you in. That's the price for all of the fun and features.

This article, "NoSQL standouts: New databases for new applications," was originally published at InfoWorld.com. Follow the latest developments in application development, data management, cloud computing, and open source at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

| 1 2 Page 4
From CIO: 8 Free Online Courses to Grow Your Tech Skills
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.