Cutting-edge experiment No. 5: Tapping the speed of NoSQL
Let's face it: We programmers are a lazy bunch. We won't start building something from scratch unless we need to. New tools are usually built around one big new feature. Sometimes there are even more.
The only way to get these features is to embrace these new tools. Many of the new NoSQL databases slip effortlessly into the cloud. They see a rack of machines and work well across all of them. That's why they were built and what they do well. They wouldn't exist if they weren't needed.
There are a wide collection of NoSQL projects that offer slightly different collections of features, and enumerating them and explaining the differences between them is beyond the scope of this article. A few of the more popular tools are Cassandra, MongoDB, CouchDB, and Riak. Some companies are also offering the tools as services. MongoLab and MongoHQ are two that offer to store data using MongoDB. Similar versions are available for all of them.
The ability to respond like lightning and scale almost as quickly are great features that may be worth rewriting all of your code to take advantage of, but one of the reasons these seductions of the cutting edge seem so great is because we haven't felt how they can go wrong. There's usually a dark side, and it often takes a bit of time to discover it -- often by mistake.
The same issues confront NoSQL databases. They're fast, but mainly because they don't offer any iron-clad promises of consistency. They suck up the data and respond with an "All Clear" before they're sure that the data has been written to disk. This may be adequate for many of the websites that traffic in social gossip where a lost status update means little, but it's not ideal for others.
Find a spot where you can afford to play without reservations and begin to tinker with a few of these key-value datastores.
Cutting-edge experiment No. 6: Finding connections with graph databases
The idea of a database was well defined in the last century. You define a table with a list of columns that hold particular data, then insert rows into the database until it's full. The columns might hold integers, decimal numbers, or strings, but that's about all of the flexibility you get.
The graph databases like Neo4j are a new twist on the idea. You still stick your numbers and letters in columns, but now you can create pointers between the rows that form networks. If you're storing a social network, the database is ready to record who is friends with whom.
This has always been possible with regular databases by giving each row a key and storing the pointers as keys in a column. The power of the graph database appears when you start running queries. The graph databases can unpack the network and start searching the network using well-honed algorithms for search. It doesn't need to do complex linking and joining like relational databases. If you want to count how many people are friends of friends of friends, the query engine is there for you. If you want test how many friendship hops there are between two people, the engine can search the network and find the answer.
Neo4J is distributed by Neo Technologies in three different levels and several licenses. All the searching power is available in the community edition that is distributed under GPL 3.0. The Advanced version and the Enterprise version add more tools for monitoring the throughput, synchronizing a cluster, and backing up the database. They are distributed with the Affero GPL license for experimentation and open source projects, but many businesses will be most interested in the commercial licenses that include support and avoid the requirement to share your code.
The trade-off comes in the collection of features. The graph databases are not as elaborate or as well-developed as the more general cousins. They are experts at graph algorithms but don't offer the same depth and breadth of traditional features. Choosing a graph database means forgoing the others.