Best of NoSQL: 7 document databases compared

Which document-oriented database is right for your app? Follow this guide to the most developer-friendly NoSQL databases

Best of NoSQL: 7 document databases compared
Thinkstock

“The right tool for the right job.” If such wisdom holds true anywhere, it certainly holds true with the choice of database a developer picks for a given application. Document databases, one of the family of data products collectively referred to as “NoSQL,” are for developers who want to focus on their application rather than the database technology.

With a document database, data is not stored in tables with distinct column types. Instead, it’s stored in freeform “documents” with any number of fields and any number of nested structures. Such documents are typically represented as JSON, and updated either by way of APIs or by sending JSON to a REST endpoint. Most every modern programming language supports JSON and REST, so working with a document database feels more like working natively with those data structures rather than working with a traditional database.

This schemaless design, as it is called, has its limitations. A developer must do more work to ensure that inserted data is consistent, because such consistency isn’t always guaranteed by the database itself. SQL, the standard-issue and widely understood language for database work, isn’t supported by most document databases, so those with existing database expertise must start from scratch. But the convenience, speed, scalability, and versatility of a document database is hard to beat when you’re writing an application that needs a protean, free-form data structure.

Here we’ve profiled seven of the best known and most widely used document databases. Four of the seven—CouchDB, Couchbase Server, MongoDB, and RethinkDB—are open source projects with few or no practical barriers to getting started; Couchbase and MongoDB are also available in supported enterprise editions under commercial licenses. The other three—Amazon DynamoDB, Google Firebase, and IBM Cloudant—are hosted services from major cloud vendors, where close integration with other services in those clouds is a big draw.

See the table below to compare features. Read on for brief discussions of each database.

Amazon DynamoDB

Amazon’s DynamoDB document store began life in 2012 as an extension of Amazon’s SimpleDB. Under the hood it is powered by a key-value store, Dynamo. A co-developer of DynamoDB would later draw on many of the same ideas to create Apache Cassandra. Of course you won’t find DynamoDB in an open source incarnation. It’s available exclusively as a hosted offering on the Amazon cloud.

Like most of Amazon’s other cloud offerings, DynamoDB is a pay-as-you-go-for-what-you-need managed service. Developers set how much storage capacity to provide for keeping either unstructured documents or key-value pairs, and choose a flat hourly rate limit for read and write requests to the database. No need to provision servers or configure replication—Amazon handles all of that under the covers, and recently added autoscaling to the mix.

Naturally, DynamoDB offers developers useful integrations with other services in the Amazon cloud. Triggers, for instance, can be set up by way of AWS Lambda functions. Amazon’s BI and analysis tools are also nearby.  The proximity to these services is convenient, but it also means Amazon can upsell functionality any number of ways. Caching and acceleration a la Redis, for instance, are available by way of the DynamoDB Accelerator, a cost-plus add-on.

Unlike many other cloud-native databases, DynamoDB is also available in a version that can be downloaded and run locally. However, DynamoDB Local is not intended for production use, but rather as a way to stage an application in a test environment without requiring connectivity or running up an Amazon bill.

Couchbase Server

Couchbase is not so much sibling to CouchDB as successor. Couchbase was built on work done in CouchDB and Membase, but is not related to either of those projects. It’s a document database and distributed key-value store rolled into one, with advanced features like automated failover and cross-datacenter replication, intended for enterprise use cases.

One feature that sets Couchbase apart from CouchDB and other competitors is its SQL-like query language called N1QL (pronounced “nickel”). N1QL doesn’t offer the full range of commands you would expect from an ANSI SQL implementation—at least, not yet—but it provides enough, such as JOIN operations, for someone with SQL experience to get workable results. The Couchbase query system is not just for developers, but for the DBAs and business analysts who normally deal with conventional databases. Features like the EXPLAIN keyword seem to have been put in specifically to appeal to that crowd.

As a combination document database and key-value store, Couchbase stores documents by using their unique identifiers as the key. Documents can also be assigned time-to-live values, to function like a key-value cache. That said, a true key-value caching system like Redis will be far faster for basic key-value storage, but Couchbase is more flexible, and Redis and Couchbase can be combined effectively to speed things up. On that note, Couchbase has native support for the Memcached protocol, so existing applications that use Memcached can plug into Couchbase as a substitute.

Couchbase Server comes in a full-blown for-pay edition, a free-to-use community edition, and an open source edition, which is the foundation for the others. The community edition can be deployed in production, but lacks the more advanced features the enterprise edition as well as support, so non-buyer beware. Some features in Couchbase, such as its horizontal scaling functionality, have found their way into the CouchDB project, but that is more the exception than the rule.

Another edition of Couchbase worthy of note for app developers is Couchbase Lite, an embeddable version of Couchbase that can synchronize with instances of the full-blown edition. Couchbase Lite is the key component in Couchbase Mobile, an application stack for mobile apps that need a data store that synchronizes automatically with a back end. Couchbase Mobile is available for iOS, Android, Java. .Net, MacOS, and tvOS.

CouchDB

The CouchDB project was begun in 2005 by a former IBM developer and moved to the Apache Software Foundation in 2008. It is sometimes assumed that CouchDB is the basis for Couchbase, but CouchDB and Couchbase are parallel projects with different aims. Whereas Couchbase is both a document database and a key-value store, CouchDB is strictly a document database. And while Couchbase has long focused on enterprise features such as fault tolerance and a SQL-like query language, such niceties are only beginning to arrive in CouchDB.

CouchDB emphasizes simplicity of deployment and ease of use. Retrieving data from the database just involves sending JSON-formatted queries to a REST HTTPS endpoint, with the results returned in JSON. Most every modern programming language can do these things, and also perform the mapping and reducing needed to create the views behind CouchDB queries and reports. There is no need for an ODBC driver or a data connector.

One of CouchDB’s special sauces is its data reconciliation technology. Changes made to one CouchDB peer are automatically reconciled with others, in a manner akin to a version control system. Any conflicts between document versions are retained as if they were previous revisions to that document.

This eventually consistent model is useful for databases that aren’t always or consistently connected (such as for intermittently connected mobile applications), or in cases where you don’t need the latest-and-greatest version of data in a particular node. But it’s also one of CouchDB’s biggest caveats. If you do need immediate consistency, CouchDB is not the place to find it.

Scalability has long been a weak spot for CouchDB, but it has recently been addressed. Version 2.0 stirred in a new clustering technology, courtesy of bits open sourced by Cloudant/IBM and merged into the project. Finally, for those who are familiar with MongoDB and want to use a similar declarative query syntax, the Mango project, also from Cloudant/IBM, provides that as an external add-on.

Google Firebase Realtime Database

The Firebase Realtime Database is just one component in the Firebase stack, which includes authentication, performance monitoring, user analytics, and many other functions. The whole stack is intended for building apps heavy on audience engagement and insight, but you might think of the database as Google’s answer to DynamoDB—a way to provide fast-syncing data storage between a cloud back-end and local apps on multiple platforms.

Google acquired Firebase in 2014. In the years since, it has wired up Firebase to take advantage of many Google Cloud features. Google Cloud Functions for Firebase, for instance, allows you to trigger JavaScript functions in the cloud in response to Firebase events. Google Analytics for Firebase lets you pull mobile app data into BigQuery for deeper analysis.

As gaming is one of Firebase’s target applications, the SDKs provided for Firebase include the Unity cross-platform game development framework. Developers working on more conventional enterprise-focused or consumer-facing projects have plenty of other choices: native iOS and Android, C++, generic web/JavaScript, and any other language that supports REST (Java, Python, you name it).

Firebase is designed to work in scenarios where connectivity isn’t guaranteed. Like CouchDB, it caches changes locally when offline, and automatically synchronizes with the back end when connectivity returns. Note that Firebase isn’t designed to be used as a standalone, entirely offline solution; on Android, for instance, local databases are limited to 10 MB in storage.

IBM Cloudant

Cloudant is essentially IBM’s hosted edition of CouchDB. Originally, Cloudant was an independent company, offering an edition of CouchDB called “BigCouch” hosted on IBM’s SoftLayer cloud. In 2014, IBM acquired Cloudant outright as part of IBM’s overall push towards analytics and big data. Today Cloudant is positioned mainly to developers on IBM’s Bluemix PaaS, where they have multiple data products (CLoudant, dashDB, DataWorks, and Watson Analytics) to choose from, each with different use cases.

Cloudant is meant to be more than a hosted version of CouchDB. Cloudant provides features not readily available in CouchDB itself, such as natively integrated full-text search. Full-text search in CouchDB typically requires integration with external projects.

A behind-the-firewall edition of Cloudant, called Cloudant Local, offers all of the same functionality as the as-a-service offering. It’s available on the Ubuntu and Red Hat flavors of x86 Linux, as well as IBM’s own System z running Red Hat or Suse. Developers can snag a free, test-and-dev-only version in a Docker image. Data can be replicated in both directions between Cloudant and an instance of CouchDB, so it’s relatively easy to move between either one as needed.

Some of Cloudant’s improvements to CouchDB have found their way back into the underlying CouchDB project, including CouchDB 2.0’s horizontal scaling functionality and the Mango query language interface. But don’t take that as proof that Cloudant features will automatically trickle down to CouchDB.

MongoDB

MongoDB is easily the most widely deployed document database, and the best-known among the developer community. It embodies most of the key concepts found in document databases and NoSQL systems generally: schemaless storage, a scale-out architecture, and a shared-nothing design.

The open source edition of MongoDB already includes the vast majority of the features needed to gin up a basic production deployment. Commercial licenses add key enterprise features including backup, automation extensions, monitoring, data exploration tools, a BI connector with SQL support, and an in-memory storage engine.

1 2 Page 1
Page 1 of 2
InfoWorld Technology of the Year Awards 2023. Now open for entries!