Why Microsoft's Cosmos DB represents the future of cloud databases

Here are four reasons why there's more to Microsoft's new database as a service than minimal management and greater flexibility

Why Microsoft's Cosmos DB represents the future of cloud databases
Thinkstock

At first glance, Microsoft's new Cosmos DB Azure database seems like a rebadged successor to Azure's planet-scale NoSQL offering, DocumentDB. It's easy to read Cosmos DB as a point-revision version of its predecessor, down to the fact that existing DocumentDB users will be automigrated.

But what's most important about Cosmos DB is not where it's coming from, but where it's heading—and how it may be taking a sizable slice of the cloud-native database world with it. Here are four reasons why Cosmos DB is a harbinger of what's to come for cloud-native database technology and how it's a sign of what's already arrived.

1. Every major cloud vendor will need to complete with similar options

Here, "similar" means a single database as a service that offers familiar database metaphors (such as SQL), high consistency and availability, horizontal scale, and minimal management hassle. 

Right now the most direct competition in terms of total feature set is the newly unveiled Google Cloud Spanner. Amazon has multiple offerings, but each provides only part of the picture: managed conventional databases (Amazon RDS), NoSQL (Amazon DynamoDB), and a data warehouse (Amazon Redshift). IBM is in a roughly similar position; it has various options for different use cases, but no single product can fit the whole bill.

On the other hand, a one-size-fits-all offering could be overkill. Not everyone needs to have scale-out, zero maintenance, or high availability out of the box for their first project. But it's a draw to have those features when you need them without switching systems—and it's even better if the one-for-all solution can do so without any major downsides.

2. Committing to one model of consistency is on the way out

Databases have long been distinguished by their model for consistency. You could choose strong consistency (conventional SQL) at the cost of scale, or you could chose eventual consistency (NoSQL) and get enhanced scale, though at the cost of consistency across nodes.

With Cosmos DB, Microsoft offers multiple consistency models in the same database, so the choice of model can be a function of the workload rather than the product. Cloud Spanner and CockroachDB both attempt to provide horizontal scale without sacrificing strong consistency, but they don't offer a mechanism for choosing a compromise between the two when it makes sense.

3. Also out: Choosing one particular style of database

Cosmos DB also doesn't force a commitment to a conventional column-style, key/value, or document-based paradigm. Existing NoSQL systems like MongoDB can use Cosmos DB as a storage back end, or Cosmos DB can be queried by conventional SQL. It's also possible to use Cosmos DB as a graph database with the Gremlin graph database API (available only in preview as of this writing).

What Microsoft is offering here isn't one particular kind of database. It's a universal back end for different kinds of databases—likely including future styles of database that haven't been invented yet. Rather than devise a new product to support customer demand for a new database, Microsoft could use Cosmos DB as the substrate and get to market faster than the competition. Customers who've already built on top of Cosmos DB would be able to dive in and start swimming sooner, too.

4. Same goes for database management generally

If there's a consistent theme with databases as a service, it's relieving the hassle of managing a database. The bigger and more complex the average workload, the more prominent the appeal of this approach. Case in point: Snowflake, a data warehouse as a service with minimal management needs.

Cosmos DB has several of the same ambitions, as many of its features don't require close work to run well. For instance, it provides mechanisms for partitioning data, but they're decoupled from the applications using the data, so any changes to one doesn't automatically require changes to the other. The schema-less design of the system also reduces the amount of work needed to make global changes like adding columns.

The convenience of any cloud-hosted database is universally appealing, but the frontier for those conveniences won't stop at having hosted, managed versions of known quantities like MySQL or PostgreSQL—or even SQL Server. What's next, and what's already started to come upon us, are databases built not only to be cloud-first, but to challenge assumptions about whether the hard choices we had to make when picking such products even need to be made anymore.