NoSQL standouts: New databases for new applications
Cassandra, CouchDB, MongoDB, Redis, Riak, Neo4J, and FlockDB reinvent the data storeFollow @peterwayner
NoSQL databases: Redis
Like CouchDB and MongoDB, Redis stores documents or rows made up of key-value pairs. Unlike the rest of the NoSQL world, it stores more than just strings or numbers in the value. It will also include sorted and unsorted sets of strings as a value linked to a key, a feature that lets it offer some sophisticated set operations to the user. There's no need for the client to download data to compute the intersection when Redis can do it at the server.
This approach leads to some simple structures without much coding. Luke Melia tracked the visitors on his website by building a new set every minute. The union of the last five sets defined those who were "online" at that moment. The intersection of this union with a friends list produced the list of online friends. These sorts of set operations have many applications, and the Redis crowd is discovering just how powerful they can be.
Redis is also known for keeping the data in memory and only writing out the list of changes every once and a bit. Some don't even call it a database, preferring instead to focus on the positive by labeling it a powerful in-memory cache that also writes to disk. Traditional databases are slower because they wait until the disk gets the information before signaling that everything is OK. Redis waits only until the data is in memory, something that's obviously faster but potentially dangerous if the power fades at the wrong moment.
The project leaders are still exploring how to expand the project, an intriguing decision because there's more than one official version of Redis from the main team. There's even one official build of Redis that comes with a Lua interpreter and a disclaimer saying that "there is no guarantee that scripting works correctly or that it will be merged into future versions of Redis!" Projects like these are never boring.
Redis providers are starting to appear. OpenRedis promises it's "launching soon." Meanwhile, Redis Straight Up charges just $19 per month, plus all of the costs from Amazon's cloud. The service handles the configuration and passes the costs on to you.
NoSQL databases: Riak
Riak is one of the more sophisticated data stores. It offers most of the features found in others, then adds more control over duplication. Although the basic structure stores pairs of keys and values, the options for retrieving them and guaranteeing their consistency are quite rich.
The write operations, for instance, can include a parameter that asks Riak to confirm when the data has been propagated successfully to any number of the machines in the cluster. If you don't want to trust just one machine, you can ask it to wait until 2, 3, or 54 machines have written the data before sending the acknowledgment. This is why the team likes to toss around its slogan: "Eventual consistency is no excuse for losing data."
The data itself is not just written to disk. Well, that is one of the options, but it's not the main one. Riak uses a pluggable storage engine (Bitcask by default) that writes the data to disk in its own internal format. There are several other options, including a version of InnoDB for those who are nostalgic for MySQL. You can get all of the belts and suspenders with the clustering power of Riak.
When it comes time to fetch the data, Riak offers to eliminate any of the errors that might appear. If two nodes end up with different versions of an object, Riak can either choose the youngest update or return both of the objects and leave the decision up to your client code. This is a very useful option for detecting potential errors in the data.