Bossie Awards 2012: The best open source databases

InfoWorld's top picks in the ever-expanding universe of the back-end data store

Bossie Awards 2012: The best open source databases
The best open source databases

Not so long ago, the world of open source databases could be summed up in one five-letter word: MySQL. But that was when we threw everything into a SQL database without giving it a second thought. Now we have NoSQL, horizontal scaling, and a slew of distributed key-value stores playing musical chairs around the CAP Theorem. And wait just a minute -- isn't PostgreSQL starting to look sexy?

Hadoop
Hadoop

Hadoop is the name brand in big data. It is also the convergence of "clustered storage" systems like Gluster and Ceph with NoSQL. Hadoop is really a collection of projects to solve large and complex data problems. In fact, there are multiple types of databases and query languages built on the overall Hadoop framework. Hadoop's complexity is as legendary as its capability, and its lack of high-availability features has both held it back and created a commercial add-on ecosystem. This limitation is addressed in the next release, as are a number of performance issues. With no one asking if we really need all of this, look for Hadoop to be commonplace in the next few years.
-- Andrew Oliver

Cascading and Scalding

Hadoop puts a treasure trove of data at your fingertips, but the process for extracting those riches can be daunting. Cascading provides a thin layer of Java-based data processing functionality atop Hadoop's MapReduce execution layer. It masks the complexity of MapReduce, simplifies the programming, and speeds you on your journey toward actionable analytics. Cascading works with JVM languages like Clojure and JRuby, but we prefer Scalding, a Scala API for Cascading from Twitter. A vast improvement over native MapReduce functions or Pig UDFs, Scalding code is clean and concise. Anyone comfortable with Ruby will find the Cascading/Scala pairing a natural fit.
-- James R. Borck

PostgreSQL
PostgreSQL

When Oracle acquired MySQL, reduced the development staff, and more or less killed the open source nature of the project, it reopened a market that MySQL had locked down. PostgreSQL has a much nicer set of drivers and supports both standard ANSI-SQL and extended features, in many cases better than MySQL. On the downside, its long legacy has left it multiprocess in the era of multithreaded. The high-availability/clustering features of PostgreSQL require a lot of elbow grease and leave much to be desired. Yet while organizations look for a community developed database, one of the eldest starts to look pretty good. Many cloud providers, such as Heroku, have chosen PostgreSQL as their RDBMS storage option as well.
-- Andrew Oliver

MySQL and MariaDB
MySQL and MariaDB

The most widely used open source database for Web apps (and many other things) remains MySQL. Support for multiple storage engines, clustering, full-text indexing, and plenty of other professional features have allowed numerous other apps profiled here, from WordPress to Movable Type, to rely on MySQL as their default database. Graphical front ends, such as phpMyAdmin and Adminer, make using the database far less of a chore. And for those seeking escape from the long shadow of Oracle, there's a community fork named MariaDB, maintained by MySQL's original lead developer, Monty Widenius.
-- Serdar Yegulalp

Adminer
Adminer

Adminer is a great alternative to phpMyAdmin. It's a single PHP file, so it's easy to install. The UI is simpler and more intuitive than phpMyAdmin, and Adminer has full support for features like foreign keys, grouping SELECT results, sorting results by multiple columns, easy downloading of blob field contents, editing fields in multiple rows. Adminer can work with MySQL, PostgreSQL, SQLite, Microsoft SQL Server, and Oracle Database, whereas phpMyAdmin supports only MySQL. Adminer even works with older versions of MySQL (4.1 and later) and PHP (4.3 and later).
-- High Mobley

Cassandra
Cassandra

Written in Java, this BigTable-based key-value database is getting more popular by the day. Open source and built to integrate with Hadoop, Cassandra offers the column family solution to developers wanting to move away from the relational database model while working with Hadoop. Focusing mainly on getting in very fast writes and providing high availability, Cassandra has slower reads than some alternatives. It is mostly used for logging purposes and real-time analysis.
-- Deep Mistry

MongoDB
MongoDB

NoSQL? Document database? The first name that comes to mind is MongoDB, due to a dual-edged blade from developer 10gen. On one side, MongoDB has strong venture capital and consequently an extensive marketing strategy. On the other side, it is the only comparatively mature document database in the NoSQL world. Highly scalable horizontally with automated sharding and highly available due to autoreplication, MongoDB offers a very reliable and yet simple solution to modern document database problems. The downsides: Working with stored procedures can be difficult, and performing data manipulation can require writing complex JavaScript code.
-- Deep Mistry

Couchbase
Couchbase

While Couchbase was a fork of CouchDB, it has become more of a full-fledged data product and less of a ball of framework than CouchDB. Its transition to a document database will give MongoDB a run for its money. It is multithreaded per node, which can be a major scalability benefit -- especially when hosted on custom or bare-metal hardware. With some nice integration features, including with Hadoop, Couchbase is a great choice for an operational data store.
-- Andrew Oliver

Neo4j
Neo4j

The database for interconnected data, Neo4j provides a reliable Java-based platform for conquering highly interconnected database problems. Available with full ACID transaction compatibility -- rare in a NoSQL database -- Neo4j has a SQL-like query language called Cypher and a scripting language called Gremlin for graph traversals. Best used to accurately and efficiently model highly complex, interconnected networks like network topologies, social networks, and conditional access control problems, it provides indexes on nodes and relationships. Direct path calculations take hundreds of lines of code for a RDBMS but two lines of code for Neo4j.
-- Deep Mistry

Riak
Riak

An open source distributed database written in Erlang and C, Riak treats all nodes equally. No one is a master or a slave. Thus, there is no fear a master will be a single point of failure. However, the masterless, fully distributed model with SNMP monitoring is not available in the open source version. Much simpler than its peers (such as Cassandra), Riak is optimal for places where even seconds of downtime would hurt.
-- Deep Mistry

Redis
Redis

There are many NoSQL databases, but Redis remains close to our heart because it has so many features that some call it a "data structure store." You don't just store numbers and strings -- you can dump in entire hashes, lists, sets, and other complicated structures. Then, to make the deal sweeter, Redis offers replication and persistence.
-- Peter Wayner