NoSQL showdown: MongoDB vs. Couchbase

MongoDB edges Couchbase Server with richer querying and indexing options, as well as superior ease of use

1 2 3 4 5 6 7 8 9 Page 8
Page 8 of 9

You can define a text index on multiple string fields, but there can be only a single text index per collection, and indexes do not store word proximity information (that is, how close words are to one another, which can affect how matches are weighted). In addition, the text index is fully consistent: when you update data, the index is also updated.

Ease-of-use features have been added to version 2.4 as well. For example, you can now define a "capped array" as a data element, which works sort of like an ordered circular buffer. If, for example, you're keeping track of the top 10 entries in a blog, using a capped array will allow you to add new entries, and (based on the specified ordering) previous entries will be removed to cap the array at 10 or whatever number you specify.

MongoDB 2.4 also has improved geospatial capabilities. For example, you can now perform polygon operations, which would allow you to determine if two regions overlap. The spherical model used in 2.4 is improved too; it now takes into account the fact that the earth is not perfectly spherical, so distance calculations are more accurate.

MongoDB mapreduce
In Couchbase Server, the mapreduce operation's primary job is to provide a structured query and information aggregation capability on the documents in the database. In MongoDB, mapreduce can be used not only for querying and aggregating results, but as a general-purpose data processing tool. Just as a mapreduce operation executes within a given bucket in Couchbase Server, mapreduce executes within a given collection in a MongoDB database. As in Couchbase Server, mapreduce functions in MongoDB are written in JavaScript.

You can filter the documents passed into the map function via comparison operators, or you can limit the number of documents to a specific number. This allows you to create what amounts to an incremental mapreduce operation. Initially, you run mapreduce over the entire collection. For subsequent executions, you add a query function that includes only newly added documents. From there, set the output of mapreduce to be a separate collection, and configure your code so that the new results are merged into the existing results.

Further speed/size trade-offs are possible by choosing whether the intermediate results (the output of the map function, sent to the reduce function) are converted to BSON objects or remain JavaScript objects. If you choose BSON, the BSON objects are placed in temporary, on-disk storage, so the number of items you can process is limited only by available disk space. However, if you choose JavaScript objects, then the system can handle only about 500,000 distinct keys emitted by the map function. But as there is no writing to disk, the processing is faster.

You have to be careful with long-running mapreduce operations, because their execution involves lengthy locks. As mentioned earlier, the system has built-in facilities to mitigate this. For example, the read lock on the input collection is yielded every 100 documents. The MongoDB documentation describes the various locks that must be considered -- as well as mechanisms to relieve the possible problems.

Managing MongoDB
Management access with the MongoDB database goes through the interactive mongo shell. Very much a command-line interface that lets you enter arbitrary JavaScript, it is nonetheless surprisingly facile. The MongoDB related commands are uncomplicated, but at the price of being dangerous if you're careless. For example, to select a database, you enter use <databasename>. But that command doesn't check for the presence of the specific database; if you mistype it and proceed to enter documents into that database, you might not know what's going on until you've put a whole lot of documents into the wrong place. The same goes for collections within databases.

Other useful command-line utilities are mongostat, which returns information concerning the number of operations -- inserts, updates, deletes, and so on -- within a specific time period. The mongotop utility likewise returns statistical information on a MongoDB instance, this time focusing on a specific collection. You can see the amount of time spent reading or writing in the collection, for instance.

In addition, 10gen provides the free cloud-based MongoDB Monitoring Service (MMS) which provides a monitoring dashboard for MongoDB installations. Built on the SaaS model, MMS requires you to run a small agent on your MongoDB cluster that communicates with the management system.

NoSQL deep dive: MongoDB vs. Couchbase
10gen's MongoDB Monitoring Service shows statistics -- in this case, for a replica set -- but management of the database is done from the command line.
1 2 3 4 5 6 7 8 9 Page 8
Page 8 of 9