NoSQL showdown: MongoDB vs. Couchbase
MongoDB edges Couchbase Server with richer querying and indexing options, as well as superior ease of use
MongoDB is about three years old, first released in late 2009. The goal behind MongoDB was to create a NoSQL database that offered high performance and did not cast out the good aspects of working with RDBMSes. For instance, the way that queries are designed and optimized in MongoDB is similar to how that would be done in an RDBMS. MongoDB's designers also wanted to make the database easier for application developers to work with -- for example, by allowing developers to change the data model quickly. MongoDB, whose name is short for "humongous," stores documents in BSON (Binary JSON), an extension of JSON that allows for the use of data types such as integers, dates, and byte arrays.
Two primary processes are at work in a MongoDB system,
mongod process is the real workhorse. In a sharded MongoDB cluster,
mongod can be found playing one of two roles: config server or shard server. The config server tracks the cluster's metadata. (In a sharded MongoDB cluster, there must be at least three config servers for redundancy's sake.) Each config server knows which server in the cluster is responsible for a given document or, more precisely, where a given contiguous range of shard keys (called a chunk) belongs in the cluster.
mongod processes in the cluster run as shard servers, and these handle the actual reading and writing of the data. For fail-over purposes, two instances of a
mongod process on a given cluster member run as shard servers. One process is primary, and the other is secondary. All write requests go to the primary, while read requests can go to either primary or secondary.
Secondaries are updated asynchronously from the primary so that they can take over in the event of a primary's crash. This, however, means that some read requests (sent to secondaries) may not be consistent with write requests (sent to primaries). This is an instance of MongoDB's "eventual consistency." Over time, all secondaries will become consistent with write operations on the primary. Note that you can guarantee consistent read/write behavior by configuring a MongoDB system such that all I/O -- reads and writes -- go to the primary instances. In such an arrangement, secondaries act as standby servers, coming online only when the primary fails.
mongos process, which runs at a conceptually higher level than the
mongod processes, is best thought of as a kind of routing service. Database requests from clients arrive first at a
mongos process, which determines which shard(s) in a sharded cluster can service each request. The
mongos process dispatches I/O requests to the appropriate
mongod processes, gathers the results, and returns them to the client. Note that in a nonsharded cluster, clients talk directly to a
MongoDB scaling and replication
MongoDB doesn't have an explicit memory caching layer. Instead, all MongoDB operations are performed through memory-mapped files. Consequently, MongoDB hands off the chore of juggling memory caching versus persistence-to-disk to the operating system. You can tweak various flush-to-disk settings for optimal performance, however. For example, MongoDB maintains a journal of database operations (for recovery purposes) that is flushed to the disk every 100ms. Not only is this interval configurable, but you can configure the system so that write operations return only after the journal has been written to disk.
Documents are placed in named containers called collections, which are roughly equivalent to Couchbase's buckets. A collection serves as a means of partitioning related documents into separate groups. The effects of many multidocument operations in a MondoDB database are restricted to the collection in which those operations are performed. MongoDB supports sharding at the collection level, which means -- should requirements dictate -- you could construct a database with unsharded and sharded collections. Of course, only a sharded collection is protected against a single point of failure.
A document's membership in a particular shard is determined by a shard key, which is derived from one or more fields in each document. The exact fields can be specified by the database administrator. In addition, MongoDB provides autosharding, which means that, once you've configured sharding, MongoDB will automatically manage the storage of documents in the appropriate physical location. This includes rebalancing shards as the number of documents grows or the number of
mongod instances changes.
As of the 2.4 release, MongoDB supports both hash-based sharding and range-based sharding. As you might guess, hash-based sharding hashes the shard key, which creates a relatively even distribution of documents across the cluster. With range-based sharding (the sole sharding type prior to 2.4), a given member of a MongoDB sharded cluster will store all the documents within a given subrange of the shard key's overall domain. More precisely, MongoDB defines a logical container, called a chunk, which is a subset of documents whose shard keys fall within a specific range. The
mongos process then dictates which
mongod process will manage a given chunk.