NoSQL grudge match: MongoDB vs. Couchbase Server

Which document database? From ease of installation and backup flexibility to index design and query capabilities, a few key differences point the way

NoSQL grudge match: MongoDB vs. Couchbase Server
Thinkstock
At a Glance

Choosing the right database for the job can be a daunting task, particularly if you’re entertaining the full space of SQL and NoSQL options. If you’re looking for a flexible, general-purpose option that allows for fluid schemas and complex nested data structures, a document database might be right for you. MongoDB and Couchbase Server are two popular choices. How should you choose?

MongoDB combines the benefits of immense popularity, support for simple graph searches, and the ability to perform SQL queries via a BI connector. Couchbase has its own large community of users, a performant key-value architecture, and a SQL-like query language capable of navigating nested document structures.

In short, both MongoDB and Couchbase are powerful and flexible document-oriented databases with plenty of extras. That said, they have important differences that tilt the balance one way or the other, depending on your needs. To help you decide, we’ll march these databases through the gauntlet of key considerations, covering how each performs with regard to installation and setup, administration, ease of use, scalability, and documentation.

This discussion is based on MongoDB 3.4 and Couchbase Server 4.6. You might also check out my stand-alone reviews of MongoDB 3.4 and Couchbase Server 4.0.

Installation and setup

Installation and setup can be viewed from two perspectives: developers working against a local instance and infrastructure engineers setting up an initial production cluster. Many NoSQL databases have strong stories around developer friendliness, increasing the chances of a developer trying out the product and introducing it to their systems. A straightforward local setup is a strong selling point. On the other hand, the database will ultimately prove its worth in production, so the production setup is just as important to get right.

Developer setup

Rather than use binaries running on the bare metal, we’ll look at what it takes to set up these two databases in a Docker environment. The Docker setup for both MongoDB and Couchbase is pretty straightforward. Couchbase requires a few extra ports to be exposed, but it’s a simple matter to deal with. Once the images are pulled down and the containers start up, there’s a noticeable difference in the developer experience. With MongoDB, you’re done. You can connect via an application or the Mongo shell and get to work immediately. By contrast, Couchbase takes you through a mandatory setup process via the UI where you’re faced with a bunch of configuration options geared toward infrastructure engineers. As a developer, you can keep the selected options and use a default bucket, but it adds friction to the experience.

MongoDB wins this one, but not without a caveat. Just because the local deployment was easy doesn’t mean you can do the same thing in production. It may seem obvious that production environments require more care and configuration, but the widespread ransom attacks on unsecured, publicly accessible MongoDB instances earlier this year suggest that many shops are taking dangerous shortcuts.

Round winner: MongoDB.

Production setup

Deploying a distributed database to production tends to involve many steps and a fair degree of coordination; MongoDB and Couchbase are no different. In both cases, the difficulty of setup will depend on the requirements of the deployment, with different performance trade-offs involving different levels of complexity.  

MongoDB clusters will either consist of a replica set or a sharded cluster. A replica set is a group of MongoDB servers that all contain the same data, whereas a sharded cluster distributes data across a number of replica sets. Replica sets are simple to configure, consisting of a single type of server to be deployed. Sharded clusters are more involved, requiring three different types of servers to be deployed, where each is replicated. Clusters can be configured via command-line flags, configuration files, and database commands.

Couchbase clusters can consist of a single server type or multiple server types, depending on the performance characteristics you need from the cluster. The Couchbase architecture consists of different services that can be enabled or disabled on a per-node basis. In a simple scenario, you enable all services on all nodes. However, if tuning to the needs of each service is desired or you want to scale each service independently, you will have to start configuring different server types, allocating commodity hardware for the data service, SSDs for the index service, CPU-optimized for the query service, and so on. Clusters can be configured via the built-in web UI, the command-line interface, and the REST API.

As far as production setup of data infrastructure goes, both MongoDB and Couchbase are fairly clear-cut. Sure, you can dive into configuration and tuning options and never come out, but in most cases these will be on the easier end for infrastructure engineers.

Round winner: Tie. 

Administration

Once the database is running in production and accepting traffic, administration becomes a key concern. To evaluate the ease of administration, I’ll look at the backup process, database upgrades, and monitoring approaches.

Backups

Backups are an important part of production database hygiene, and running databases in a highly available, distributed fashion doesn’t change that one bit.

MongoDB offers several options for backing up data of a running cluster. If the underlying operating system supports point-in-time snapshots, you can rely on that feature to capture a backup at a precise moment in time. This gets a bit tricky for backing up sharded clusters because you’ll have to snapshot a secondary of each shard and a config server at the same time.

System-level tools like cp or rsync can be used to copy the database files to another location, but writes must be paused during the process due to the nature of those tools. Although MongoDB ships with command-line tools to back up and restore databases, these tools are not recommended for larger clusters. Alternatively, you can pay for Cloud Manager or Ops Manager, or deploy through the MongoDB Atlas DBaaS platform to get UI-based tooling that will take care of backups and restores for you.

Couchbase ships with command-line tools to back up data from the various services, and these can be configured to run full backups or two kinds of incremental backups. Incremental backups can either be incremental from the last full backup (cumulative incremental) or incremental from the last backup of any kind (differential incremental). This allows for complex backup structures that require varying levels of storage space and involve varying levels of restore complexity.

Enterprise customers can draw on the cbbackupmgr utility, which uses different underlying data structures to achieve better performance when backing up data.

Round winner: Couchbase, due to its greater flexibility and support for incremental backups.

Upgrading

A long-running cluster should have a clear, easy upgrade path. The harder it is to upgrade, the less likely it will be kept up-to-date. That means developers and administrators alike will miss out on new features.

MongoDB upgrades are best understood from the replica set level. If you’re running a sharded cluster, you mostly follow the steps for upgrading replica sets on each shard. Within a replica set, each secondary is shut down, upgraded in place, and started up. Once the secondaries are in operation and consistent with the primary, a failover is induced and the former primary can be taken down and upgraded. It will start up again as a secondary and catch up on writes it missed when offline. Thus, upgrades are mostly an online process, but the primary failover will likely result in 10 to 20 seconds of no writes, so a maintenance window with acceptable downtime is required.

Couchbase approaches upgrades the same way you would add or remove a node from a cluster. All of the data of the upgrading node must be rebalanced across the cluster, then rebalanced again when the upgrade is complete and the node rejoins the cluster. That rebalancing process has to happen for each node in the cluster, one after another. This is going to take a lot longer than upgrading a MongoDB cluster, due to all of the data that must be moved around. Another option is to take the whole cluster offline, upgrade each node, and bring them all back online.

While the Couchbase upgrade path requires zero downtime, the process is long and requires a copious amount of data shuffling to work.

Round winner: Tie. Tiebreaker: If maintenance downtime is acceptable, then MongoDB wins. If not, then Couchbase is the only choice.

Monitoring

Visibility into a running cluster is obviously essential to successful database administration. When things are going wrong, nothing is worse than having a constrained view of the truth in the cluster.

MongoDB offers CLI tools and commands within the shell that provide metrics on instance activity and performance. Beyond that, MongoDB will helpfully point you to third-party tools or its own enterprise products (Cloud Manager, Ops Manager, Atlas).

Couchbase, on the other hand, ships with a web UI that includes statistics and visualizations for instances, nodes, query performance, and more. Additionally, Couchbase can be configured to send email alerts when certain statistics fall out of range.

couchbase builtin monitoring IDG

Couchbase provides metrics visualizations right out of the box, whereas MongoDB relies on companion tools. 

Round winner: Couchbase, for out-of-the-box visualizations and alerting.

Ease of use

After the database is set up and all of our administration needs are met, the major concern shifts from operations to usage. I’ll break that down to data modeling, index design, basic querying, and aggregations.

Data modeling

As document databases, neither MongoDB nor Couchbase can avoid the challenge of how to deal with relational data. Both offer the ability to store relational data as nested, denormalized data as well as in the form of references to other top-level documents. This approach to data storage ends up being the main consideration point for data modeling for both databases, despite each supporting an increasing breadth of use cases, features, and query patterns.

Round winner: Tie.

Index design

Indexes perform the same function in document databases as they do in relational databases. That is, they represent certain data in more efficient ways to enhance query performance. MongoDB and Couchbase take very different approaches to index design and creation.

MongoDB supports index creation for one or more fields within a document, allowing you to specify order and direction (ascending or descending) of standard indexes. It’s also possible to include special geospatial indexes and full-text indexes as a part of the same syntax. The query engine will use those indexes, prefixes of those indexes, or a combination of several indexes to speed up requests.

Couchbase relies on two different mechanisms for improving query performance: MapReduce views and the Global Secondary Index (GSI). MapReduce views consist of user-defined JavaScript code that processes data as it passes through the system, like an incremental pre-aggregation. MapReduce views can be as simple as allowing document searches on an inner field, or they can include more complex logic that performs calculations and aggregations on the data within documents.

Writing MapReduce in JavaScript to support queries is kind of unwieldy, so you'll generally want to use the GSI where possible. Indexes in the GSI are described using N1QL (pronounced “nickel”), a partial SQL implementation on top of Couchbase. N1QL syntax is fairly clear, and N1QL queries are far better than MapReduce, but you have to place the index on a specific node. If you want an index to be highly available, you have to manually create that index on more than one node.

Round winner: MongoDB, for its consolidated indexing API and ability to avoid MapReduce altogether.

Basic queries

Given an appropriate data model, most queries to the database tend to be simple. Beyond CRUD operations where the ID of the document in question is known, it’s important to be able to express different ways of filtering documents and choose which fields we’re interested in.

MongoDB describes queries in JSON, providing a declarative syntax for specifying conditions and filters on fields. The query document can consist of any number of query selectors that describe what the result set should look like. Ranges, equality, text search, and geospatial queries can all be defined within this query document. The document supports boolean operators, so multiple query clauses can be logically joined together with AND, OR, and so on. The query document can quickly grow into a heavily nested JSON document, which can be overwhelming at times and definitely takes some getting used to. It’s also possible to utilize projections in queries, which allows you to return only the fields that you care about and decrease the overall result size over the wire.

At a Glance
1 2 Page 1
Page 1 of 2