You'll find no clear notion of transactions in MongoDB. Certainly, you cannot perform pure ACID transactions on a MongoDB installation. Database changes can be made durable if you enable journaling, in which case write operations are blocked until the journal entry is persisted to disk (as described earlier). And MongoDB defines the $atomic
isolation operator, which imposes what amounts to an exclusive-write lock on the document involved. However, $atomic
is applied at the document level only. You cannot guard multiple updates across documents or collections.
MongoDB indexing and queries
MongoDB's querying capabilities are well developed. If you're coming to MongoDB from the RDBMS world, the online documentation shows how SQL queries might be mapped to MongoDB operations. For example, in most cases, the equivalent of SQL's SELECT
can be performed by a find()
function. The find()
function takes two arguments: a query document and a projection document. The query document specifies filter operations on specific document fields that are fetched. You could use it to request that only documents with a quantity field whose contents are greater than, say, 100 be returned. Therefore, the query document corresponds to the WHERE
clause in an SQL statement. The projection document identifies which fields are to be returned in the results, which allows you to request that, say, only the name and address fields of matching documents be returned from the query. The sort()
function, which can be executed on the results of find()
, corresponds to SQL's ORDER BY
statement.
You can locate documents with the command db.<collection>.find()
, possibly the simplest query you can perform. The find()
command will return the first 20 members of the result, but it also provides a cursor, which allows you to iterate through all the documents in the collection. If you'd like to navigate the results more directly, you can reference the elements of the cursor as though it were an array.
More complex queries are possible thanks to MongoDB's set of prefix operators, which can describe comparisons as well as boolean connections. MongoDB also provides the $regex
operator in case you want to apply regular expressions to document fields in the result set. These prefix operators can be used in the update()
command to construct the MongoDB equivalent of SQL's UPDATE ... WHERE
statement.
In the 2.2 release, MongoDB added the aggregation framework, which allows for calculating aggregated values without having to resort to mapreduce
(which can be overkill if all you want to do is calculate a field's total or average). The aggregation framework provides functionality similar to SQL's SUM
and AVG
functions. It can also calculate computed fields and mimic the GROUP BY
operator. Note that the aggregation framework is declarative -- it does not employ JavaScript. You define a chain of operations, much in the same way you might perform Unix/Linux shell programming, and these operations are performed on the target documents in stream fashion.
One of the more significant new features in MongoDB's 2.4 release is the arrival of text search. In the past, developers accomplished this by integrating Apache Lucene with MongoDB, which piled on considerable complexity. Adding Lucene in a clustered system with replication and fault tolerance is not an easy thing to do. MongoDB users now get text search for free. The new text search feature is not meant to match Lucene, but to provide basic capabilities such as more efficient Boolean queries ("dog and cat but not bird"), stemming (search for "reading" and you'll also get "read"), and the automatic culling of stop words (for example, "and", "the", "of") from the index.
You can define a text index on multiple string fields, but there can be only a single text index per collection, and indexes do not store word proximity information (that is, how close words are to one another, which can affect how matches are weighted). In addition, the text index is fully consistent: when you update data, the index is also updated.
Ease-of-use features have been added to version 2.4 as well. For example, you can now define a "capped array" as a data element, which works sort of like an ordered circular buffer. If, for example, you're keeping track of the top 10 entries in a blog, using a capped array will allow you to add new entries, and (based on the specified ordering) previous entries will be removed to cap the array at 10 or whatever number you specify.
MongoDB 2.4 also has improved geospatial capabilities. For example, you can now perform polygon operations, which would allow you to determine if two regions overlap. The spherical model used in 2.4 is improved too; it now takes into account the fact that the earth is not perfectly spherical, so distance calculations are more accurate.
MongoDB mapreduce
mapreduce
operation's primary job is to provide a structured query and information aggregation capability on the documents in the database. In MongoDB, mapreduce
can be used not only for querying and aggregating results, but as a general-purpose data processing tool. Just as a mapreduce
operation executes within a given bucket in Couchbase Server, mapreduce
executes within a given collection in a MongoDB database. As in Couchbase Server, mapreduce
functions in MongoDB are written in JavaScript.
You can filter the documents passed into the map function via comparison operators, or you can limit the number of documents to a specific number. This allows you to create what amounts to an incremental mapreduce
operation. Initially, you run mapreduce
over the entire collection. For subsequent executions, you add a query function that includes only newly added documents. From there, set the output of mapreduce
to be a separate collection, and configure your code so that the new results are merged into the existing results.
Further speed/size trade-offs are possible by choosing whether the intermediate results (the output of the map
function, sent to the reduce
function) are converted to BSON objects or remain JavaScript objects. If you choose BSON, the BSON objects are placed in temporary, on-disk storage, so the number of items you can process is limited only by available disk space. However, if you choose JavaScript objects, then the system can handle only about 500,000 distinct keys emitted by the map
function. But as there is no writing to disk, the processing is faster.
You have to be careful with long-running mapreduce operations, because their execution involves lengthy locks. As mentioned earlier, the system has built-in facilities to mitigate this. For example, the read lock on the input collection is yielded every 100 documents. The MongoDB documentation describes the various locks that must be considered -- as well as mechanisms to relieve the possible problems.
Managing MongoDBmongo
shell. Very much a command-line interface that lets you enter arbitrary JavaScript, it is nonetheless surprisingly facile. The MongoDB related commands are uncomplicated, but at the price of being dangerous if you're careless. For example, to select a database, you enter use <databasename>
. But that command doesn't check for the presence of the specific database; if you mistype it and proceed to enter documents into that database, you might not know what's going on until you've put a whole lot of documents into the wrong place. The same goes for collections within databases.
Other useful command-line utilities are mongostat
, which returns information concerning the number of operations -- inserts, updates, deletes, and so on -- within a specific time period. The mongotop
utility likewise returns statistical information on a MongoDB instance, this time focusing on a specific collection. You can see the amount of time spent reading or writing in the collection, for instance.
In addition, 10gen provides the free cloud-based MongoDB Monitoring Service (MMS) which provides a monitoring dashboard for MongoDB installations. Built on the SaaS model, MMS requires you to run a small agent on your MongoDB cluster that communicates with the management system.
10gen's MongoDB Monitoring Service shows statistics -- in this case, for a replica set -- but management of the database is done from the command line.
In addition to the new text search and geospatial capabilities discussed above, MongoDB 2.4 comes with performance and security improvements. The performance enhancements include the working set analyzer. The idea is that you want to configure your system so that the working set -- that subset of a databas accessed most frequently -- fits entirely in memory. But it was not easy to figure out your working set or how much memory you need. The working set analyzer, which operates like a helper function, provides diagnostic output to aid you in discovering the characteristics of your working set and tuning your system accordingly. In addition, the JavaScript engine has been replaced by Google's open source V8 engine. In the past, the JavaScript engine was single-threaded. V8 permits concurrent mapreduce
jobs, as well as general speed improvements.
Finally, the Enterprise edition welcomes Kerberos-based authentication. In all editions, MongoDB now supports role-based privileges, which gives you finer-grained control over users' access and operations on databases and collections.
10gen's release of MongoDB 2.4 is accompanied by new subscription levels: Community, Basic, Standard, and Enterprise. The Community subscription level is free, but it's also free of any support. The other subscription levels provide varying support response times and hours of availability. In addition, the Enterprise subscription level comes with the Enterprise version of MongoDB, which has more security features and SNMP support. It has also undergone more rigorous testing.
Pros and cons: MongoDB 2.4
Pros |
|
Cons |
|
Mongo or Couch?
As usual, which product is the best choice depends heavily on the target application. Both are highly regarded NoSQL databases with outstanding pedigrees. On the one hand, MongoDB has spent much more of its lifetime as a document database, and its support for document-level querying and indexing is richer than that in Couchbase. On the other hand, Couchbase can serve equally well as a document database, a Memcached replacement, or both.
Happily, exploring either Couchbase or MongoDB is remarkably simple. A single-node system for either database server is easily installed. And if you want to experiment with a sharded system (and have enough memory and processor horsepower), you can easily set up a gang of virtual machines on a single system, and lash them together via a virtual network switch. The documentation for both systems is voluminous and well maintained. 10Gen even provides free online MongoDB classes for developers, complete with video lectures, quizzes, and homework.
This article, "NoSQL showdown: MongoDB vs. Couchbase," was originally published at InfoWorld.com. Follow the latest developments in application development,data management, cloud computing, and open source at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.