Review: MongoDB takes on the world

MongoDB 4.0 beefs up with global cloud clusters, multi-document ACID transactions, and HIPAA compliance

At a Glance

If you’ve built a medium-sized to large-scale web application in the last few years, you probably considered basing it on the open source LAMP or MEAN stack. The older LAMP stack uses the Linux operating system, Apache web server, MySQL relational database, and PHP programming language. MEAN uses the MongoDB NoSQL database, the Express back-end web application framework, the Angular application platform, and the Node.js JavaScript runtime. MEAN is essentially an end-to-end JavaScript stack. Linux isn’t explicitly mentioned in the acronym, but is usually the OS underneath Node.

In this review, I’ll discuss the MongoDB database, now at version 4. MongoDB is a highly scalable, operational database available in both open source and commercial enterprise versions, and it can be run on-premises or as a managed cloud service. The managed cloud service is called MongoDB Atlas.

MongoDB is far and away the most popular of the NoSQL databases. Its document data model gives developers great flexibility, while its distributed architecture allows for great scalability. As a result, MongoDB is often chosen for applications that must manage large volumes of data, that benefit from horizontal scalability, and that handle data structures that don’t fit the relational model.  

Because MongoDB is appropriate for a wide variety of use cases, it is often put forth as a replacement for relational databases. However, while freedom from rigid schema constraints is often beneficial, it’s important to keep in mind that no document database is a universal solution—not even MongoDB.

MongoDB origins

The company behind MongoDB was founded in 2007 as 10gen by a team that was behind DoubleClick, the Internet advertising company. The original motivation for the MongoDB database was to be able to handle the agility and scale required for Internet advertising. As an example of scale, DoubleClick served 400,000 ads per second in 2007, and struggled to perform with the existing databases of the time.

MongoDB is a document-based store that also has a graph-based store implemented on top of it. The other kinds of NoSQL databases are key-value stores and column-based stores. All kinds of NoSQL databases share the ability to scale out in ways that were not possible in the SQL relational databases of 2007, but the different varieties of NoSQL databases have different strengths, weaknesses, and use cases.

Some of the main NoSQL competitors to MongoDB as operational databases are Amazon DynamoDB (key-value store), Google Cloud BigTable (column store), Google Cloud Datastore (document store), Redis (in-memory, key-value store), Couchbase (multi-model key-value and document store), DataStax/Cassandra (column store), and Azure Cosmos DB (multi-model including a SQL option as well as several NoSQL stores).

What is MongoDB?

MongoDB Inc. describes MongoDB as “a document database with the scalability and flexibility that you want with the querying and indexing that you need.” To parse that, we first need to understand the nature of a document database, which is one of the kinds of NoSQL designs.

Rather than storing strongly typed data in related normalized tables with fixed schemas like a relational database, a document database stores related data in de-normalized form embedded in JSON-like name-value documents. MongoDB doesn’t actually store JSON, however: MongoDB stores BSON (Binary JSON), which extends the JSON representation (strings) to include additional types such as int, long, date, floating point, decimal128, and geospatial coordinates, as shown in the diagram below. BSON documents contain one or more fields, and each field contains a value of a specific data type, including arrays, binary data, and subdocuments. BSON also tracks the size of each document, to allow efficient seeking.

mongodb documents typed MongoDB

In this slide from a MongoDB webinar, Joe Drumgoole explains how the BSON storage format adds type information to a JSON document. These types can then be used to generate appropriate indexes as well as returning the correct type to programming languages when they issue queries.

BSON typing feeds into the indexing of fields. MongoDB can generate multi-modal graph, geospatial, B-tree, and full text indexes on a single copy of the data, using the type of the data to generate the correct type of index. MongoDB lets you create indexes on any document field.

mongodb sql mappings MongoDB

If you are familiar with SQL databases, you can use the SQL to MongoDB Mapping Chart to match the SQL concepts you know to MongoDB concepts.

MongoDB has databases, collections (tables), documents (rows), fields (columns), indexes, $lookup or embedded documents (joins), primary keys, an aggregation pipeline, and transactions. For better performance and to avoid needing multi-document transactions, you’ll probably want to use subdocuments and arrays in MongoDB rather than storing your data in normalized form as you would in a SQL database.

MongoDB 4 does have multi-document transactions, which means that you can still get ACID properties even if you have to normalize your data design. Previous versions did not.

For what it’s worth, MongoDB representatives told me that single-document transactions handle 90 percent of the use cases that need ACID properties. When customers needed ACID for multi-document transactions before version 4, they basically implemented it themselves at the application level.

By default, MongoDB uses dynamic schemas, sometimes called schema-less. The documents in a single collection do not need to have the same set of fields, and the data type for a field can differ across documents within a collection. You can change document structures at any time.

Schema governance is available, however. Starting in MongoDB 3.6, MongoDB supports JSON schema validation. To turn it on, use the $jsonSchema operator in your validator expression. Validation occurs during updates and inserts.

As you can see in the documentation snapshot and the MongoDB Atlas screenshot below, MongoDB has its own query language, implemented in the Mongo shell, in 12 supported language driver APIs (and many more from the community), and in the Compass GUI and the Atlas Collections tab (the Data Explorer). The MongoDB query language is not at all the same as SQL, but there is a more or less direct mapping between the two. I say “more or less” because relational databases don’t support embedded documents, but MongoDB does. That isn’t necessarily all good, as you’ll see in the next section.

mongodb crud query MongoDB

In addition to the Mongo shell and Compass GUI, MongoDB officially supports queries in 10 programming languages, and unofficially many more developed by the community. The Mongo shell basically uses JavaScript notation, but automatically prints all results.

mongodb atlas search MongoDB

Here I am searching a food database for items with low saturated fats. The $lt operator of course means “less than.” I am using a public data collection in MongoDB Atlas that was shared with me by MongoDB Inc.

The MongoDB aggregation framework uses pipeline operators that are more or less the equivalent of the SQL GROUP BY and WHERE clauses. For example, the following query uses MongoDB’s user group database to list the past events and the total RSVPs for each event, in the Mongo shell:

> db.past_events.aggregate( [{'$match': {'batchID': 101, 'event.status': 'past', 'event.group.urlname': {'$in': ['Atlanta-MongoDB-User-Group', 'Austin-MongoDB-User-Group', 'Baltimore-MongoDB-Users-Group', 'Bangalore-MongoDB-User-Group', 'Belfast-MongoDB-User-Group', 'Bergen-NoSQL', 'Bordeaux-MongoDB-User-Group', 'Boston-MongoDB-User-Group']}}},
{'$group': {'_id': {'urlname': '$event.group.urlname', 'year': {'$year': '$event.time'}}, 'event_count': {'$sum': 1}, 'rsvp_count': {'$sum': '$event.yes_rsvp_count'}}},
{'$project': {'_id': 0, 'group': '$_id.urlname', 'year': '$_id.year', 'event_count': 1, 'rsvp_count': 1}}])

The query uses the aggregate function with the $match, $in, $group, $sum, and $project operators and returns the following:

{ "event_count" : 2, "rsvp_count" : 27, "group" : "Boston-MongoDB-User-Group", "year" : 2017 }
{ "event_count" : 5, "rsvp_count" : 94, "group" : "Boston-MongoDB-User-Group", "year" : 2016 }
{ "event_count" : 5, "rsvp_count" : 231, "group" : "Boston-MongoDB-User-Group", "year" : 2015 }
{ "event_count" : 3, "rsvp_count" : 175, "group" : "Boston-MongoDB-User-Group", "year" : 2014 }
{ "event_count" : 10, "rsvp_count" : 489, "group" : "Boston-MongoDB-User-Group", "year" : 2013 }
{ "event_count" : 12, "rsvp_count" : 444, "group" : "Boston-MongoDB-User-Group", "year" : 2012 }
{ "event_count" : 2, "rsvp_count" : 118, "group" : "Boston-MongoDB-User-Group", "year" : 2011 }
{ "event_count" : 6, "rsvp_count" : 84, "group" : "Atlanta-MongoDB-User-Group", "year" : 2011 }
{ "event_count" : 3, "rsvp_count" : 74, "group" : "Baltimore-MongoDB-Users-Group", "year" : 2012 }
{ "event_count" : 1, "rsvp_count" : 5, "group" : "Bergen-NoSQL", "year" : 2015 }
{ "event_count" : 15, "rsvp_count" : 286, "group" : "Atlanta-MongoDB-User-Group", "year" : 2012 }
{ "event_count" : 11, "rsvp_count" : 321, "group" : "Baltimore-MongoDB-Users-Group", "year" : 2013 }
{ "event_count" : 8, "rsvp_count" : 124, "group" : "Bangalore-MongoDB-User-Group", "year" : 2015 }
{ "event_count" : 6, "rsvp_count" : 381, "group" : "Bangalore-MongoDB-User-Group", "year" : 2013 }
{ "event_count" : 7, "rsvp_count" : 242, "group" : "Bangalore-MongoDB-User-Group", "year" : 2012 }
{ "event_count" : 13, "rsvp_count" : 233, "group" : "Atlanta-MongoDB-User-Group", "year" : 2013 }
{ "event_count" : 10, "rsvp_count" : 171, "group" : "Baltimore-MongoDB-Users-Group", "year" : 2014 }
{ "event_count" : 3, "rsvp_count" : 28, "group" : "Austin-MongoDB-User-Group", "year" : 2017 }
{ "event_count" : 2, "rsvp_count" : 52, "group" : "Austin-MongoDB-User-Group", "year" : 2016 }
{ "event_count" : 1, "rsvp_count" : 8, "group" : "Atlanta-MongoDB-User-Group", "year" : 2018 }
Type "it" for more

MongoDB also has a mapReduce function. The Compass GUI has an aggregation pipeline builder that makes creating queries such as the one above fairly straightforward.

MongoDB supports a range of server data consistency levels starting with read uncommitted and going to causal. Causal consistency was only added in version 3.6, and is also supported in client sessions. The client sets read and write concerns to specify the desired consistency level.

In MongoDB, a write operation is atomic on the level of a single document, even if the operation modifies multiple embedded documents within a single document. When a single write operation (e.g. db.collection.updateMany()) modifies multiple documents, the modification of each document is atomic, but the operation as a whole is not atomic. Starting in version 4.0, for situations that require atomicity for updates to multiple documents or consistency between reads to multiple documents, MongoDB provides multi-document transactions for replica sets, at a cost in performance.

At a Glance
  • MongoDB is a scalable, distributed, document database available for on-premises deployment and as a globally distributed cloud service.

    Pros

    • Now has ACID properties for multi-document transactions
    • Available as a globally distributed cloud cluster (MongoDB Atlas)
    • Supports a range of consistency levels, up to causal consistency
    • Has client drivers for more than 10 programming languages
    • Has a serverless back end as a service (MongoDB Stitch)

    Cons

    • Not a substitute for a relational database
    • Sharded MongoDB databases don’t yet support multi-document transactions
1 2 Page 1
Page 1 of 2