NoSQL databases: Riak
Riak is one of the more sophisticated data stores. It offers most of the features found in others, then adds more control over duplication. Although the basic structure stores pairs of keys and values, the options for retrieving them and guaranteeing their consistency are quite rich.
The write operations, for instance, can include a parameter that asks Riak to confirm when the data has been propagated successfully to any number of the machines in the cluster. If you don't want to trust just one machine, you can ask it to wait until 2, 3, or 54 machines have written the data before sending the acknowledgment. This is why the team likes to toss around its slogan: "Eventual consistency is no excuse for losing data."
The data itself is not just written to disk. Well, that is one of the options, but it's not the main one. Riak uses a pluggable storage engine (Bitcask by default) that writes the data to disk in its own internal format. There are several other options, including a version of InnoDB for those who are nostalgic for MySQL. You can get all of the belts and suspenders with the clustering power of Riak.
When it comes time to fetch the data, Riak offers to eliminate any of the errors that might appear. If two nodes end up with different versions of an object, Riak can either choose the youngest update or return both of the objects and leave the decision up to your client code. This is a very useful option for detecting potential errors in the data.
The project is shepherded by Basho, a company that provides both open source and enterprise versions of Riak. The open source version appears quite feature-rich. The main differences in the enterprise version are a slicker Web-based administration tool and the availability of high-speed, internode communication across data centers. And only the enterprise version can use SNMP.
NoSQL databases: Neo4J
If there's one application that's most different in this collection, it's Neo4J, a tool optimized to store graphs instead of data. The Neo4J folks use the word "graph" like a computer scientist to mean a network of nodes and connections. Neo4J lets you fill up the data store with nodes and then add links between the nodes that mean things. Social networking applications are its strength.
The code base comes with a number of common graph algorithms already implemented. If you want to find the shortest path between two people -- which you might for a site like LinkedIn -- then the algorithms are waiting for you.
Neo4J is pretty new, and the developers are still uncovering better algorithms. In one recent version, they bragged about a new caching strategy: searching algorithms will run much faster because Neo4J is now caching the node information.
They've also added a new query language with pattern matching that looks a bit like XSL. You can search a graph until you identify nodes with the right type of data. It is a new syntax to learn.
The Neo4J project is backed by Neo Technology, which offers commercial versions of the database with more sophisticated monitoring, fail-over, and backup features.
NoSQL databases: FlockDB
If someone out there is writing code, someone else out there is complaining that the code is too complicated. It should be no surprise that some people think Neo4J is too intricate and sophisticated for what needs to be done. We know that Neo4J has truly arrived because the FlockDB fans are clucking about how FlockDB is simpler and faster.
FlockDB is a core part of the Twitter infrastructure. It was released by Twitter more than a year ago as an open source project under the Apache license. If you want to build your own Twitter, you can also download Gizzard, a tool for sharding data across multiple instances of Flock. Both tools are ready and waiting to run in a JVM.
Although many of us would call FlockDB a graph database because it stores relationships between nodes, some think that the term should apply only to sophisticated tools like Neo4J. Did someone start following someone else? Well, you can link up Flock's nodes with data such as the time that the relationship began. That part is like Neo4J. Where Flock differs is how deeply you can query this data. FlockDB takes a pair of nodes and gives you the data about the connection. Everything else is up to you. Neo4J not only enables all types of graph-walking algorithms, but it provides them as services. FlockDB uses the word "non-goal" for these multihop queries, meaning that the developers have no interest in supporting them.
The code is pretty new, and it doesn't seem to be attracting the same kind of widespread attention as some of the other projects. All of the recent commits have come from Twitter employees, and I wasn't able to find anyone offering FlockDB hosting as a service. FlockDB still seems to be mainly a Twitter project.
NoSQL databases: How do you choose?
There's no easy answer. Most shops would be happy with any of them, even if they select the worst one for their needs. Choosing the best, though, is a bit harder because a good developer will want to balance the strength of the project, the availability of commercial support, and the quality of the documentation with the quality of the code.
The greatest divergence is in the extras. All of them will store piles of keys with their values, but the real question is how well they split the load across servers and how well they propagate changes across them. Then there's the question of hosting. The idea of a cloud service that will do all of the maintenance for you is seductive.
The stakes are higher because switching is more difficult than it is with the SQL databases. There's no standard query language in this world, nor is there a vast array of abstraction layers like the JDBC. These NoSQL databases have the power to lock you in. That's the price for all of the fun and features.
This article, "NoSQL standouts: New databases for new applications," was originally published at InfoWorld.com. Follow the latest developments in application development, data management, cloud computing, and open source at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.
Having trouble installing and setting up Win10? You aren’t alone. Here are many of the most common...
It's all about knowing how to build an open source community -- plus experience running applications in...
Win7 Update scans got you fuming? Here’s how to make the most of Microsoft’s 'magic' speed-up patch
The proliferation of insecure devices in every facet of our lives will have consequences far beyond the...
From a simple platform for beginners to an expert-level development workbench, there's an IDE for most...
You don't need to buy a new phone to add hours to your battery. All you need is to flip a few switches...
Look to these clever open source tools to keep secrets out of source code, identify malicious files,...