NoSQL showdown: MongoDB vs. Couchbase

MongoDB edges Couchbase Server with richer querying and indexing options, as well as superior ease of use

1 2 3 4 5 6 7 8 9 Page 7
Page 7 of 9

Typically, you permit the load balancer to determine which cluster member manages a given shard range. However, with version 2.4, you can associate tags with shard ranges (a tag being nothing more than an identifying string). Once that's done, you can specify which member of a cluster will manage any shard ranges associated with a tag. In a sense, this lets you override some of the load balancer's decision making and steer identifiable subsets of the database to specific servers. For example, you could put the data most frequently accessed from California on the cluster member in California, the data most frequently accessed from Texas on the cluster member in Texas, and so on.

MongoDB's locking is on the database level, whereas it was global prior to version 2.2. The system implements shared-read, exclusive-write locking (many concurrent readers, but only one writer) with priority given to waiting writers over waiting readers. MongoDB avoids contentions via yield operations within locks. Predictive coding was added to the 2.2 release; if a process requests a document that is not in memory, it yields its lock so that other processes -- whose documents are in memory -- can be serviced. Long-running operations will also periodically yield locks.

You'll find no clear notion of transactions in MongoDB. Certainly, you cannot perform pure ACID transactions on a MongoDB installation. Database changes can be made durable if you enable journaling, in which case write operations are blocked until the journal entry is persisted to disk (as described earlier). And MongoDB defines the $atomic isolation operator, which imposes what amounts to an exclusive-write lock on the document involved. However, $atomic is applied at the document level only. You cannot guard multiple updates across documents or collections.

MongoDB indexing and queries
MongoDB makes it easy to create secondary indexes for all document fields. A primary index always exists on the document ID. As with Couchbase Server, this is automatically generated for each document. However, with MongoDB, you can specify a separate field as being the document's unique identifier. For example, a database of bank accounts might use the bank's generated account number as the document ID field. Indexes exist at the collection level, and they can be compound -- that is, created on multiple fields. MongoDB can also handle multikey indexes. If you index a field that includes an array, MongoDB will index each value in that array. Finally, MongoDB supports geospatial indexes.

MongoDB's querying capabilities are well developed. If you're coming to MongoDB from the RDBMS world, the online documentation shows how SQL queries might be mapped to MongoDB operations. For example, in most cases, the equivalent of SQL's SELECT can be performed by a find() function. The find() function takes two arguments: a query document and a projection document. The query document specifies filter operations on specific document fields that are fetched. You could use it to request that only documents with a quantity field whose contents are greater than, say, 100 be returned. Therefore, the query document corresponds to the WHERE clause in an SQL statement. The projection document identifies which fields are to be returned in the results, which allows you to request that, say, only the name and address fields of matching documents be returned from the query. The sort() function, which can be executed on the results of find(), corresponds to SQL's ORDER BY statement.

You can locate documents with the command db.<collection>.find(), possibly the simplest query you can perform. The find() command will return the first 20 members of the result, but it also provides a cursor, which allows you to iterate through all the documents in the collection. If you'd like to navigate the results more directly, you can reference the elements of the cursor as though it were an array.

More complex queries are possible thanks to MongoDB's set of prefix operators, which can describe comparisons as well as boolean connections. MongoDB also provides the $regex operator in case you want to apply regular expressions to document fields in the result set. These prefix operators can be used in the update() command to construct the MongoDB equivalent of SQL's UPDATE ... WHERE statement.

In the 2.2 release, MongoDB added the aggregation framework, which allows for calculating aggregated values without having to resort to mapreduce (which can be overkill if all you want to do is calculate a field's total or average). The aggregation framework provides functionality similar to SQL's SUM and AVG functions. It can also calculate computed fields and mimic the GROUP BY operator. Note that the aggregation framework is declarative -- it does not employ JavaScript. You define a chain of operations, much in the same way you might perform Unix/Linux shell programming, and these operations are performed on the target documents in stream fashion.

One of the more significant new features in MongoDB's 2.4 release is the arrival of text search. In the past, developers accomplished this by integrating Apache Lucene with MongoDB, which piled on considerable complexity. Adding Lucene in a clustered system with replication and fault tolerance is not an easy thing to do. MongoDB users now get text search for free. The new text search feature is not meant to match Lucene, but to provide basic capabilities such as more efficient Boolean queries ("dog and cat but not bird"), stemming (search for "reading" and you'll also get "read"), and the automatic culling of stop words (for example, "and", "the", "of") from the index.

1 2 3 4 5 6 7 8 9 Page 7
Page 7 of 9