10 standout NoSQL databases to try

These 10 "second-generation" data stores represent the forefront of the NoSQL revolution

10 standout NoSQL databases to try

The NoSQL buzzword caught fire just a few short years ago, but we're already well into the second generation of the movement. While the early stacks of code were just experiments, the systems today are much more mature, ready for action, and facing the hard truths of a technology that's come of age -- so much so that some of the best NoSQL data stores have already been rewritten, and a few are even sticking the 2.0 label on the latest version. Here's a list of some of the better-known tools for building fast, scalable repositories for lots of data.

Illustration from Apache


The Apache Cassandra project emerged out of Facebook in 2008 and is now a fully grown tool used for many large data stores and integrated with other popular tools like Solr. The mechanism offers a hybrid mixture of a column-oriented database with a key/value store. Not every row must have each column, but the columns are grouped into families that make them feel like tables. The system offers a tunable amount of replication and consistency. In one recent test, Netflix built a cluster of 288 nodes and found writes scaled linearly.

Illustration from Luke, a Lucene index editing tool


Most people don't think of Lucene as a database because they use it to index large blocks of text, but it employs much the same model as the other NoSQL data stores. Each document is just a bundle of keys attached to values. Standard applications drop big blocks of text into one of the values, but metadata usually ends up attached to other keys. The queries take words and look for key/value pairs that hold them. Lucene/Solr is, of course, best at queries that aren't limited to exact matches but look for words or parts of words that appear in the blocks.

Illustration from Riak


Riak is a flexible key/value store that offers eventual consistency to data stored on a collection of nodes that can grow whenever demand increases. The fun part of working with Riak is writing map/reduce queries in either JavaScript or Erlang. They'll query each node, gather the results, and repeat if you need to use the results to search again. The system also offers full-text indices for Solr-like searching and a control panel for watching over your cluster (shown).

Illustration from CouchDB


CouchDB data arrives in JavaScript's JSON format, its queries are written in JavaScript, and the data goes back in JSON. It's a database built for the Web and the people who program it. (Sidenote: Some use CouchDB offline in the background of mobile apps.) CouchDB stores key/value pairs and propagates them over the nodes, offering eventual consistency. There's also a more commercial cousin, Couchbase, that offers caching, better sharding, incremental queries, better indices, and a few more features.

Illustration from Neo4J


Most NoSQL databases store flexible bundles of keys and values. Neo4J stores relationships between objects, a structure that mathematicians often confusingly call a "graph." The tool includes a number of algorithms for searching and analyzing the relationships, making it possible to look for someone who is a friend of a friend of a friend. These "graph traversal" algorithms save you the trouble of chasing pointers.

Illustration from Oracle

Oracle NoSQL

The wizards at Oracle took one look at the NoSQL movement and decided they needed to have a product that would split up key/value pairs across a collection of nodes. The resulting Oracle NoSQL offers a flexible amount of transaction protection that can range from acknowledging the data is stored on one node to waiting until it is successfully backed up across the network.

Illustration from MongoDB


MongoDB has all of the classic features that define NoSQL: key/value storage, JavaScript formatting, and flexible replication for sharding across nodes. (Sharding is illustrated.) The data is written with a philosophy MongoDB calls multiversion concurrency control, a structure that keeps older versions of the data around to help keep consistency in complicated transactions. The user base is large, and there is a wide selection of ancillary tools, no doubt thanks to the open source option (strict AGPL).

Illustration from Apache

Hadoop (HBase)

While most people think of Hadoop and all of its tools as a mechanism for harnessing the power of many machines, Hadoop also includes a database, HBase, that spreads data out among the nodes. The map/reduce structure of Hadoop is well-suited for complicated computational jobs or queries that are farmed out to each node. The field is growing, and new databases for each node like Accumulo are enhancing the Hadoop platform.

Illustration of Accumulo sharing from Apache


Google helped start the NoSQL craze with BigTable, and now several others have built their own implementations that mimic much of the structure. Users of Google's AppEngine can squirrel away key/value pairs in the DataStore, Hadoop users can put them in Accumulo, and others can just use Hypertable. All are basic key/value stores with a few extra features added for searching speed.

Illustration from DynamoDB


Amazon Web Services offers more ways to store data than there are fingers on one hand. DynamoDB is the NoSQL solution that takes key/value pairs and spreads them out across servers in three different zones where all the data is stored on SSDs. If you anticipate more demand for the traffic, DynamoDB will add more servers behind the scenes.

Copyright © 2012 IDG Communications, Inc.