It's impossible to separate the NoSQL trend from 10gen's MongoDB. Yes, there are all sorts of NoSQL databases, as InfoWorld's Andrew Oliver detailed in his classic "Which freaking database should I use?" And it's notoriously difficult to determine market share among open source offerings like MongoDB and its competitors, such as Couchbase or Cassandra. Nonetheless, few would dispute that MongoDB has become the darling of a new generation of developers, who discovered early in the product's history that it simplified the creation of Web applications and enabled much easier database scaling than traditional RDBMSes.
10gen CEO Dwight Merriman co-founded the company in 2007 -- shortly after he sold his first venture, DoubleClick, to Google for $3.1 billion. Merriman comes from a hardcore technology background: He was CTO of DoubleClick from 1995 to 2005, and he designed the original technology for DART (Dynamic Advertising, Reporting, and Targeting), the dominant ad-serving technology of the Web. He launched the MongoDB open source project "because we really felt like there was a need for kind of a new class of data technologies, and the time was right for change." At 10gen's founding, MongoDB became a commercial open source project, with a choice of subscription tiers that include support and a special subscriber edition.
To get a better sense of the appeal and the trajectory of MongoDB, InfoWorld executive editor Doug Dineley and I interviewed Merriman last week. We began by asking Merriman where he got the inspiration to invent the document database.
Q: Can you draw a direct line between your experience at DoubleClick and realizing a need for a different kind of database?
A: Yes. In a way, MongoDB is the kind of the database product I wish I'd had then. Because we were dealing with thousands of servers and a dozen data centers all over the world, and it could never be down. It was very hard. And of course, computers were also a thousand times slower in the late '90s than they are now because of Moore's Law. It really felt like reinventing the wheel every time. We wanted to create something that fit well with the way we write software today and with the scale of data we work with today -- and fits well with the cloud layer, whether it be private or public cloud. We couldn't really find anything, so about five years ago, at the beginning of 10gen, we started creating MongoDB from scratch.
Q: If you could point your finger at the two bottlenecks of conventional database technology, they would be scaling out and the controls around RDBMS. Were those the two biggest points of pain?
A: There's scaling out -- then the other one is the data model. You know, the relational database is probably the most successful technology in the history of software. We're using inventions that are 30 or 40 years old. But if you look at the other tools we use to build software, the programming languages we used have changed in a much faster cycle. The software development methodologies have changed. We're not doing a waterfall software development anymore. We're doing agile development and lots of releases. They weren't designed for a world with object-oriented programming languages and some of these new languages we have for cloud computing. We think there is going to be a very big inflection point at the data layer, the biggest in the last 25 years in terms of databases.
Q: You created the first widely successful NoSQL document database. What did you base that architecture on?
A: It wasn't based on anything in particular; it's a career's worth of learning what works and what doesn't. We were looking at cloud computing -- at needs for horizontal scalability and how we wanted to write code -- and we couldn't find tools that did what we wanted. There are a couple of dimensions in which things are very difficult. One is: Just how do you scale out? There are a couple aspects of scale that are superhard theoretically, and one of them is distributive joins. If you want to do distributive joins on a 1,000-server cluster, that's a hard problem.
Our point of view on that was to say: Well, I don't have a clever way to do distributive joins, so instead, we're not going to do them. We're going to try to pick a data model that allows us to create something that's very useful without having them. That kind of levels into the document-oriented data model, which we like a lot anyway for development purposes. We thought it was a really good fit in terms of the way people code today.
That was sort of like the catalyst that kind of got us going in this direction. I think if you look at the NoSQL space -- next-generation, horizontally scalable, nonrelational databases -- they basically all have those properties. I really like document orientation as a concept, because it fits well with the way we write code. iI's pretty readable, especially for developers and DBAs. One of the big ideas in databases is separating the data from the code. I should be able to look at the contents of the database without hurting the program. I think that's maintained here, and I think relational did that well too.
I also like JSON (JavaScript object notation) a lot as a basis for documents. JSON gives us a standards-based, independent language for object-style data, and I find it easier to read as a human than, say, XML. Several of the NoSQL products are JSON-style, document-oriented databases. I think that's really kind of the sweet spot there in terms of the data model that's going to be somewhat standardized in that space.
Q: Why make MongoDB open source? Did you have a business model in mind from the very start?
A: I think it was for several reasons. One is we like open source conceptually, as developers. We're just fans of it. We think it makes a lot of sense. But in addition, we think you can build great businesses in the open source world that are complementary to the free project. Red Hat is a great company and a pretty big company.
Q: Who were your first customers?
A: Our first customers were from the Web 2.0 and startup world. That was kind of back in 2009, so you have folks like Shutterfly or Craigslist or Foursquare using MongoDB. Then a year later we saw bigger enterprises using the product, folks like a Intellisponse or O2 or Disney or eBay. And now even at the enterprise level we're getting beyond the early-adopter phase. In 2012 the biggest trend I saw in terms of adoption was financial services, where banks and other financial firms were adopting MongoDB and NoSQL in general quite widely. They use it for new projects. They have all this legacy stuff, of course, but the majority of them are now NoSQL for at least some percentage of their new projects. There are some organizations who are saying: This is our default way to build apps.
Q: Web applications, right?
A: Well, no, because in MongoDB there's nothing that's that specific to Web apps. Conceptually, it's a general-purpose database. True, the early adopters were from the Web, But I think people use it for everything: for content management systems, for personalization systems, for streaming mobile. They use it for Web, but also for accounting stuff and on the offline analytic side, where you have large repositories of historical data. It's pretty broad.
Q: Are there certain common characteristics of these enterprise apps? You make it sound like a hodgepodge.
A: It's very broad. It's kind of like if we were to ask: What's the use case for Oracle RDBMS? I can give you a good answer, but it might be a long answer. A document database certainly works well when the shape of the data fits with document-oriented data models, either from the programming languages or the types of columns or the datasets you're dealing with.
For example, one telco wrote a product catalog application for their company, a giant company with 100,000 products. Some of them are phones, some of them are extended warranties, and some of them are service plans. They have all these different properties to their products. They found it was very easy to do that with MongoDB because of the way the data model works. That's a nice example of a use case.
I think some other sweet spots would be the back end to content management applications, lots of usage for mobile applications, lots of usage for online applications that need an operational data store that's real time, with tens of thousands of reads and writes per second.
I think one area where it would not be used would be for supercomplex transactions, nor would it be a good fit when you have legacy requirements for SQL or UVC, for example.
Q: At the CIO level I think there's probably a widespread perception that NoSQL is no good for transactions at all. Would you like to give the counterargument to that?
A: It depends on the product. Different products have different consistency models. MongoDB is a little bit more in the strong consistency view of the world. In MongoDB you can do atomic operations on individual JSON documents, and those documents can be quite rich. That's your transactional scope. Within a single document, if you want to debit A and credit B, as a transaction, you can do that. However, you can't do that across documents.
We'd like for you to be able to do that, but the problem is that in distributive transactions that are fully generalized on 1,000-server clusters, it's very, very hard to make those fast. So what we've done is ... we're giving you what we can make fast. It turns out that will get you pretty far, especially if you take into consideration that you're getting the schema design. I find that in 75 percent of use cases there's a mass transactionality that ends up being a great way to solve a problem. But there's some minority where you would say: No, I want something different.
If there are 20 requirements for a project, the classic database would be superstrong on the distributive transaction requirement, but then it's going to be weaker on some of these other things, like ease of the data mapping and speed and scale, for example. You have to kind of sum it up and kind of decide what's the right product for the right use case.
Q: A while ago, Andrew Oliver, one of our contributors, wrote an article called "Ill-informed haters go after MongoDB." His point was that due to the popularity of MongoDB a lot of unrealistic expectations were raised. Where do you think some of that backlash came from?
A: We did see some of that, especially a year ago, but a lot less lately. I think it's a combination of the product getting better and the knowledge base of developers in the world increasing constantly. All the best practices and design patterns that are good for MongoDB, the best ways to use it in a design, are not going to be automatic. With MongoDB it's very easy to get started, so I think there's a little bit of temptation to dive in and read how to fly the airplane, but we see less of that now.
In the spirit of that, in the fall, we launched MongoDB free online education classes, which are these kind of massive online open course model-type classes, kind of similar to a Coursera or the Stanford classes, and it went really well. After two weeks we had 30,000 enrollment in that semester for those two classes we did. We did a developer class and a DBA class, and they were super well received and there was tons of participation. We're just beginning semester two now. We're doing a new class for Java developers. If you put information in people's hands you can help them be successful.
Q: Do you hire your best students?
A: Yeah, we'd like to.
Q: It's getting to be a pretty competitive landscape out there, with Couchbase's offering a document database version. What's your strategy in competing in such a chaotic, fast-growing area?
A: It feels a little bit to me like the early days of relational bases, where you had Oracle and Sybase and...
Q: Right. You don't want to be Sybase.
A: Yeah. But it's early days. There's lots of competition and lots of big projects and companies are doing really well and growing superfast. The most unique thing about MongoDB is the combination of scale with developer productivity: scalability and agility. I think we're unique in giving you both at the same time.
Q: Who do you find that you actually compete with?
A: Cassandra, Couchbase, and HBase would be the first three that come to mind.
Q: Do you see Oracle NoSQL at all?
A: I don't see it.
Q: One last question. What should we expect from 10gen in 2013?
A: We're the makers of MongoDB. Every time we do a big release, we increment by 0.2; there'll be a 2.4 release coming soon, then a 2.6 release. We're doing some work in the area of full-text search. We're adding a lot of new security features, along with lots and lots of performance-tuning features. Improved cluster management for very large clusters is something else we're working on. If you have a 500-server cluster, MongoDB is pretty easy to administer, but when you multiply times 500, that's quite a bit of work, and we want to make sure that's as easy as possible. Those are some of the things we'll be releasing over the next 12 months.
This article, "10gen CEO: Why we're the NoSQL leader," originally appeared at InfoWorld.com. Read more of Eric Knorr's Modernizing IT blog. And for the latest business technology news, follow InfoWorld on Twitter.