Which freaking database should I use?

In the era of big data, good old RDBMS is no longer the right tool for many database jobs. Here's a quick guide to choosing among NoSQL alternatives

1 2 3 Page 3
Page 3 of 3

Graph databases
Graph databases are really less about the volume of data or availability and more about how your data is related and what calculations you're attempting to perform. As Philip Rathle, senior director of product engineering at Neo Technologies (makers of Neo4j), told me, graph databases are especially useful when "the data set is fundamentally interconnected and non-tabular. The primary data access pattern is transactional, i.e., OLTP/system of record vs. batch... bearing in mind that graph databases allow relatedness operations to occur transactionally that, in an RDBMS world, would need to take place in batch."

This flies in the face of most NoSQL marketing: A specific reason for a graph database is that you need a transaction that is more correct for your data structure than what is offered by a relational database.

Common uses for graph databases include geospatial problems, recommendation engines, network/cloud analysis, and bioinformatics -- basically, anywhere that the relationship between the data is just as important as the data itself. This is also an important technology in various financial analysis functions. If you want to find out how vulnerable a company is to a bit of "bad news" for another company, the directness of the relationship can be a critical calculation. Querying this in several SQL statements takes a lot of code and won't be fast, but a graph database excels at this task.

You really don't need a graph database if your data is simple or tabular. A graph database is also a poor fit if you're doing OLAP or length analysis. Typically, graph databases are paired with an index to allow for better search and lookup, but the graph part has to be traversed; for that, you need a fix on some initial node.

Sorting it all out
Graph databases provide a great example of why it's so hard to name these new database types. "NewDB" is my preferred name -- except that, oops, some are as old as or older than the RDBMS. "NoSQL" isn't a great name because some of these support SQL and SQL is really orthogonal to the capabilities of these systems.

Finally, "big data" isn't exactly right because you don't need large data sets to take advantage of databases that fit your data more naturally than relational databases. "Nonrelational" doesn't quite apply, either, because graph databases are very relational; they just track different forms of relationships than traditional RDBMSes.

In truth, these are the rest of the databases that solve the rest of our problems. The marketing noise of past decades combined with hardware and bandwidth limitations, as well as lower expectations in terms of latency and volume, prevented some of the older kinds of databases from reaching as wide notoriety as RDBMSes.

Just as we shouldn't try to solve all of our problems with an RDBMS, we shouldn't try to solve all of our math problems with set theory. Today's data problems are getting complicated: The scalability, performance (low latency), and volume needs are greater. In order to solve these problems, we're going to have to use more than one database technology.

This article, "Which freaking database should I use?," was originally published at InfoWorld.com. Follow the latest developments in business technology news and get a digest of the key stories each day in the InfoWorld Daily newsletter. For the latest business technology news, follow InfoWorld on Twitter.

Copyright © 2012 IDG Communications, Inc.

1 2 3 Page 3
Page 3 of 3