For years we’ve seen the database market split between the traditional relational database and new-school NoSQL databases. According to Gartner, however, these two worlds are heading toward further consolidation. As Gartner analyst Nick Heudecker opines, “Each week brings more SQL into the NoSQL market subsegment. The NoSQL term is less and less useful as a categorization.”
Yet that promised “consolidation” may not be all that Gartner predicts. If anything, we may be seeing NoSQL databases—rich in flexibility, horizontal scalability, and high performance—don enough of the RDBMS’s SQL clothing to ultimately displace the incumbents. But the “NoSQL vendor” most likely to dominate over the long term may surprise you.
NoSQL: Wrong name, right idea
“NoSQL” has always been somewhat of a misnomer, both because it purports to exclude SQL and because it lumps together very different databases under a common framework. A graph database like Neo4j, for example, is completely different from a columnar database like Cassandra.
What they share, however, is a three-fold focus, as Kelly Stirman, CMO at a stealth analytics startup and former MongoDB executive, told me in an interview. In his words, “NoSQL introduced three key innovations that the market has embraced and that the traditional vendors are working to add: 1) flexible data model, 2) distributed architecture (critical for cloud), and 3) flexible consistency models (critical for performance).”
Each element was critical to enabling modern, increasingly cloud-based applications, and each has presented traditional RDBMSes with a host of problems. Yes, most RDBMSes have implemented good enough but not great flexible data models. Yes, they’re also attempting flexible consistency models, with varying levels of (non)success. And, yes, they’re all trying to embrace a distributed architecture and finding it a brutally tough slog.
Even so, these attempts by the RDBMSes to become more NoSQL-like has led, in the words of DataStax chief evangelist Patrick McFadin in a conversation, to a “great convergence” that ultimately yields “multimodel” databases. Importantly, McFadin continued, this same convergence is taking place among the NoSQL databases as they add various components of the RDBMS in an attempt to hit massive mainstream adoption.
But make no mistake, such convergence is not without its problems.
As Rohi Jain, CTO at Esgyn, describes it:
It is difficult enough for a query engine to support single operational, BI, or analytical workloads (as evidenced by the fact that there are different proprietary platforms supporting each). But for a query engine to serve all those workloads means it must support a wider variety of requirements than has been possible in the past. So, we are traversing new ground, one that is full of obstacles.
This inability to have one data model rule them all afflicts the RDBMS more than NoSQL, Mat Keep, director of product and market analysis at MongoDB, told me: “Relational databases have been trying to keep up with the times as well. But most of the changes they’ve made have been stopgaps–adding new data types rather than addressing the core inflexibility of the relational data model, for example.”
Meanwhile, he notes, “Our customers have a desire to stop managing many special snowflakes and converge on a single, integrated platform that provides all the new capabilities they want with the reliability and full features that they need.” DataStax has been doing the same with Cassandra, as both companies expand their NoSQL footprints with support for the likes of graph databases, but also going deeper on SQL with connectors that allow SQL queries to be translated into a language that document and columnar databases can understand.
None of these efforts really speaks to NoSQL’s long-term advantage over the venerable RDBMS. Everybody wants to speak SQL because that’s where the primary body of skills reside, given decades of enterprise build-up around SQL queries. But the biggest benefit of NoSQL, and the one that RDBMSes have failed to master, according to Stirman, is its distributed architecture.
Jared Rosoff, chief technologist of Cloud Native Apps at VMware, underlines this point: “Even if all the databases converged on SQL as query language, the NoSQL crowd benefits from a fundamentally distributed architecture that is hard for legacy engines to replace.” He continues, “How long is it going to get MySQL or Postgres or Oracle or SQL Server to support a 100-node distributed cluster?”
Though both the RDBMS and NoSQL camps have their challenges with convergence, “It’s way easier for the NoSQL crowd to become more SQL-like than it is for the SQL crowd to become more distributed” and “a fully SQL compliant database that doesn’t scale that well” will be inferior to “a fully distributed database that supports only some subset of SQL.”
In short, SQL is very useful but replaceable. Distributed computing in our big data world, quite frankly, is not.
Winner take some
In this world of imperfect convergence, NoSQL seems to have the winning hand. But which NoSQL vendor will ultimately dominate?
Early momentum goes to MongoDB and DataStax-fueled Cassandra, but Stirman suggests a different winner entirely:
What the market really wants is an open source database that is easy to use and flexible like MongoDB, scales like Cassandra, is battle hardened like Oracle, all without changing their security and tooling. MongoDB is best positioned to deliver this, but AWS is most likely to capture the market long term.
Yes, AWS, the same company that most threatens to own the Hadoop market, not to mention enterprise infrastructure generally. Amazon, the dominant force in the public cloud, is best positioned to capitalize on the enterprise shift toward the cloud and the distributed applications that live there. Database convergence, in sum, may ultimately be Bezos’ game to lose.