NoSQL databases have ridden the hype cycle traced by any new technology. Some new feature or capability -- seamless multiserver scaling, in this case -- spawns a host of new startups.
None of them begins with a mature product. When these startups are quizzed about missing features that incumbent products have had for years, the answer is usually something like “a lot of people don’t need that.” Then they implement that stuff and pull a Steve Jobs -- “Now with cut and paste!” -- causing fans to marvel at their brilliance and party like it's 1979.
Case in point: MongoDB 3.0, the leading NoSQL database, now has document-level locking, which is analogous to an RDBMS offering row-locking, a feature that first appeared in commercial databases in the 1980s.
But according to director of products Kelly Stirman, document-level locking is not the biggest feature in MongoDB 3.0, which is supposed to be generally available in early March. The really big deal, he says, is the new, pluggable storage API. This means that, as with MySQL, you can swap out the storage engine behind the database. The company hopes an ecosystem will emerge around this, but I've heard that one before. (You could argue that pluggable storage APIs have allowed MySQL to evolve and survive rather than create an ecosystem of storage engines that people actually use.)
To most users, the big bang will be the Wired Tiger storage engine.
Hear the WiredTiger roar
MongoDB brings us WiredTiger via an acquisition made back in December. WiredTiger was created by Sleepycat Berkeley DB architects Dr. Michael Cahill and Keith Bostic. With applications where reads do not drastically outnumber writes, WiredTiger will offer superior performance to MongoDB's default MMapV1 engine.
For the geeky among you, WiredTiger offers both B-tree and Log Structured Merge (LSM) algorithms. Predictably, and according to benchmarks, B-tree performs better with large caches where LSM will be constantly good even when data can not be sufficiently cached. WiredTiger won’t improve your low-end performance but will boost your top-end latency (which is what most people care about).
If that's not enough, consider WiredTiger's compression prowess. For what Stirman estimates is about a 5 percent CPU overhead, you can save about 70 percent disk space, thanks to Google’s Snappy compression library.
The default in MongoDB 3.0 will still be the familiar MMapV1 engine from the previous release. This has been beefed up to allow collection-level locking, which is roughly equivalent to table locking in an RDBMS. Previous releases offered database-level write locking.
Migrating should be simple if you’re using a version of MongoDB with the same MMapV1 engine; according to the company, the format will be drop-in compatible. If you’re moving to WiredTiger, you can create a replica while the database is live and migrate as you see fit. Stirman says future releases will probably make WiredTiger the default storage engine.
Memory is king
As we all know, disks are for losers. Achieving real speed is for players like us who, instead of lying about dropping our coins on a G5 like some lame autotune rapper, blow real money on enough RAM for all of our data.
If that's how you roll, MongoDB 3.0 has an in-memory storage system. It isn’t considered production ready, but no self-respecting hipster hacker would ever use code the vendor considered production ready. For the record, I remain old-school and cannot afford to build systems with a petabyte of RAM because I left my wallet in El Segundo.
If you've purchased MongoDB Enterprise, then MongoDB 3.0 promises to address the objection that MongoDB is easy to get started with, but difficult to administer at scale.
The new OpsManager is an installable version of what MongoDB already offers in its cloud-based MMS, a Web-based graphical management tool for monitoring, provisioning, backup, and upgrades. They also offer what Stirman refers to as end-to-end auditing.
These niceties aren’t limited to the Enterprise version. MongoDB tools (such as mongodump, mongorestore) have also been rewritten to be smaller and to parallelize operations. Moreover, logging has been improved to make it easier to control what does and doesn’t get logged by the database.
Meanwhile, for what Stirman estimates is the fourth or fifth release in a row, new geospatial features have debuted -- this time around “big polygon” queries for
$geoWithin queries, when one hemisphere cannot contain the greatness of your query shape!
The future of MongoDB
I was not entirely shocked when I asked Stirman what might be on the road map for the future. He postulated that the company might take on the ability to add constraints, by which he means such requirements as enforcing that a person document has at least one phone number.
MongoDB is also hoping the community donates new storage engines. For example, vendors such as Fusion IO have custom APIs that go around the file system and directly to disk. This made me think of Oracle raw partitions and such.
The last MongoDB release, version 2.6, was hyped as the “biggest release” yet, but it clearly pales in comparison with this release. Late to the party it may be, but one should undervalue the importance of a locking policy that is actually useful in an operational system that's not almost read-only.
I have little doubt MongoDB 3.0 will convince more operations folks to join the MongoDB camp. This MongoDB is the most Web-scale yet.