Cosmos DB review: Database for a small planet

Multi-model Azure database combines global reach and a choice of five consistency models, allowing you to trade off cost for consistency

At a Glance

If you do business around the world, you may need a scalable, distributed, global database. You have several choices including Google Cloud Spanner, the open source CockroachDB, the graph database Neo4j, and the subject of this review, Azure Cosmos DB.

Azure Cosmos DB is a globally distributed, horizontally partitioned, multi-model database service. It offers four data models (key-value, column family, document, and graph) and five tunable consistency levels (strong, bounded staleness, session, consistent prefix, and eventual consistency). It offers five API sets: SQL (dialect), MongoDB-compatible, Azure Table-compatible, graph (Gremlin), and Apache Cassandra-compatible. Cosmos DB automatically indexes all data without requiring you to deal with schema and index management.

The design goals of Cosmos DB include elastic global scalability, low-cost operation, low read and write latencies, 99.99 percent availability, predictable and tunable data consistency, stringent financially-backed comprehensive SLAs, automatic schema/index management and versioning, native support for multiple data models, and popular APIs for accessing data. Yes, that’s a mouthful, as well as just a little ambitious, but it all works.

Cosmos DB is both similar to and different from each of Google Cloud Spanner, CockroachDB, and Neo4j. Google Cloud Spanner and CockroachDB offer SQL, strong consistency, cluster consensus, automatic horizontal partitioning, and the use of clock synchronization across nodes; Azure Cosmos DB does all of that, and it offers weaker consistency models that provide better performance and lower latency.

Google Cloud Spanner, CockroachDB, and Cosmos DB use markedly different dialects of SQL, with CockroachDB being the only one that is compatible with a widely adopted database (PostgreSQL). CockroachDB and Cosmos DB use hybrid logical clocks; Google Cloud Spanner uses the TrueTime API, GPS clocks, and atomic clocks.

Google Cloud Spanner and Cosmos DB are cloud services; CockroachDB and Neo4j can be downloaded and deployed on-prem. Google Cloud Spanner and Cosmos DB use Leslie Lamport’s Paxos algorithm for cluster consensus; CockroachDB uses the Raft algorithm. (Lamport, of Microsoft Research, was highly influential in the development of Cosmos DB.)

Neo4j Enterprise offers a property graph with cluster support. So does Cosmos DB, but with a different query language. Neo4j uses the Cypher Query Language. Cosmos DB uses Gremlin, the graph traversal language of Apache TinkerPop.

Cosmos DB’s global distribution

Microsoft’s Azure cloud is generally available in 36 regions around the world, with plans announced for six additional regions. Cosmos DB is available in all Azure regions worldwide, and you can add unlimited regions to your Cosmos DB instance, although there is (naturally) a higher cost to operate more regions. You can configure global distribution in the Azure Portal or using the SQL APIs.

Each Cosmos DB database has one active write (or read/write) region, and may have multiple read regions. Microsoft notes that Cosmos DB does support multi-region writes (multi-master/active-active pattern) for internal use, and this will be exposed to third-party customers in the future.

Regions may be added and dropped at any time, and the write region may be changed at any time without risk of data loss or violation of SLAs. When a new region is added, Cosmos DB will be available in the new region within 30 minutes, no matter where in the world the new region is located, for up to 100 TBs, thanks to parallel data transport between regions.

You can create global distribution for the goal of low latency or for BCDR (business continuity and disaster recovery). For low latency, you want regions in the geographical areas of the users. For BCDR, you want to use paired regions within a single geographical area, such as the East US and West US regions within North America.

Each region has a full replica of all the data, but the database is also horizontally partitioned automatically. Behind the scenes, Cosmos DB will divide your container with the throughput you provisioned into enough partitions to handle the maximum throughput. Logically, Cosmos DB allocates the key space of partition key hashes evenly across the partitions. (Note that this requires you to choose good partition keys.) It will automatically split a partition that reaches its storage limit, and automatically add partitions as you (or your auto-scaling settings) increase the maximum throughput. You can select the Unlimited storage capacity option to take advantage of partitioning and auto-scaling when creating a container in the Azure Portal.

cosmos db geo redundancy IDG

Azure Cosmos DB was designed from the ground up for geo-redundancy. The default choice for BCDR in the US is to pair the West US and East US Azure regions. To improve read latency in other areas of the world, add replicas in their regions.

Cosmos DB data models and APIs

I mentioned earlier that Cosmos DB offers four data models (key-value, column family, document, and graph) and five API sets: SQL (document-oriented dialect), MongoDB-compatible, Azure Table-compatible, graph (Gremlin), and Apache Cassandra-compatible. When you create a database account, you need to specify the API it will support. 

At the lowest level, Cosmos DB has a schema-agnostic, atom-record-sequence (ARS)-based database engine implemented on top of Azure Service Fabric. The four application data models are all projected onto the ARS-based core model.

As you might expect, the five database API sets don’t all map onto every data model. The SQL API used to be called the DocumentDB API; it applies to JSON document databases. The MongoDB API is also for document databases. Note that the wire protocol is different for the MongoDB API and the SQL API, so that one account can’t be used for both APIs, even though the document data can be migrated between the two.

The Gremlin API is for property graph databases, the Azure Table API is for key-value tables, and the Cassandra API is for wide-column (column family) databases.

While the SQL API is compatible with Azure DocumentDB, it is not the standard SQL familiar to someone used to relational databases such as Azure SQL, even though the query syntax mostly looks familiar. It’s closer to the JSON functionality in PostgreSQL (and recently in CockroachDB). For example, the following sample query is Cosmos DB SQL operating on JSON documents to extract the names of the children in the Wakefield Family, youngest to oldest:

SELECT c.givenName
FROM Families f
JOIN c IN f.children
WHERE f.id = ‘WakefieldFamily’
ORDER BY f.children.grade ASC

Like PostgreSQL, Cosmos DB creates inverted indexes on the JSON documents. The SQL API is rooted in JavaScript’s type system, expression evaluation, and function invocation, and can perform relational projections, hierarchical navigation across JSON documents, self joins, spatial queries, and invocation of user-defined functions (UDFs) written in JavaScript.

In addition to five database API sets, you can choose your application programming language. Depending on the API, the choices may include .NET, Go, Java, Node.js, Python, Xamarin, or any language that supports HTTPS REST calls. The Graph API also supports the Apache TinkerPop Gremlin console, and the MongoDB API also supports Studio 3T and other MongoDB clients.

cosmos db create account IDG

When you create a Cosmos DB account, you must choose a single API. The current API choices are SQL, MongoDB, Cassandra, Azure Table, and Gremlin (graph).

Cosmos DB consistency levels

Cosmos DB gives you a choice of five tunable consistency levels (strong, bounded staleness, session, consistent prefix, and eventual). Strong consistency, meaning that reads are guaranteed to return the most recent version of data, is what you expect from SQL databases, and what you need for financial transactions. Eventual consistency, which allows out-of-order reads, is what you expect from most NoSQL databases, and can introduce errors in the application if clients are attempting to read recent transactions that have not fully committed in all regions.

The in-between levels are less familiar. Consistent prefix means that the updates returned are some prefix of all the updates, with no gaps. (A prefix is a set of updates that precede some timestamp T1 that is earlier than the request timestamp T2.) Bounded staleness is a strong form of consistent prefix; it means the database ensures that reads lag writes by k prefixes or t interval.

Session consistency, chosen by more than 70 percent of Cosmos DB tenants, is a slightly weaker form of consistent prefix than bounded staleness; it gives you monotonic reads, monotonic writes, read-my-writes, and write-follows-reads. Session consistency is scoped to a client session, and the cost of a read operation (in terms of resource units, or RUs, consumed) with session consistency is less than strong and bounded staleness, but more than eventual consistency.

You can change the consistency level on a per-read or per-query basis. So, for example, if you populate a pick list with a query using a consistent prefix of 5,000 ms, it is the rough equivalent of doing 5,000 ms stale reads on Google Cloud Spanner. This is a reasonable trick for speeding up reads that don’t need to be up-to-the-minute, so long as your program logic compensates for the fact that an item might no longer be available when picked. Good program logic would do that anyway for this use case, because of the multi-second lag incurred by waiting for a human to click on the interface.

Cosmos DB SLAs

For a typical 1KB item, Cosmos DB guarantees end-to-end latency of reads under 10 ms and indexed writes under 15 ms at the 99th percentile, within the same Azure region. The median latencies are significantly lower (under five milliseconds).

Why is this guarantee restricted to a single region? Simple physics: The round-trip time (also known as the RTT, ping time, and network latency) between regions is limited by the speed of light, so East US to Australia Southeast has an RTT of about 250 ms.

 Cosmos DB offers comprehensive 99.99 percent SLAs that guarantee throughput, consistency, availability, and latency for Cosmos DB database accounts scoped to a single Azure region configured with any of the five consistency levels, or database accounts spanning multiple Azure regions configured with any of the four relaxed consistency levels. Furthermore, independent of the choice of a consistency level, Cosmos DB offers a 99.999 percent SLA for read availability for database accounts spanning two or more Azure regions.

cosmos db geodemo IDG

These charts represent an app that uses a Cosmos DB database account configured with a write region in North Europe, and several read regions including West US, South Central US, Japan East, Japan West, Southeast Asia, East Asia, and Australia Southeast. The charts for East US, central US, North Central US, West Europe, Australia East represent regions not configured for this database account (not added to this database account). Therefore the SDK used by the app deployed in those regions performs a cross-region call and you see higher latencies.

Cosmos DB integrations

As you might expect from a Microsoft Azure product, Cosmos DB integrates with many other Microsoft Azure products. You can, for example, bind Cosmos DB with Azure Functions through the Cosmos DB change feed to listen for changes across partitions. The change feed publishes inserts and updates, but not deletions. That may seem a bit strange to people used to the triggers in relational databases.

You can also use Cosmos DB with HDInsight to create a Lambda architecture, which enables efficient data processing of massive datasets. Lambda architectures use batch processing, stream processing, and a serving layer to minimize the latency involved in querying big data. The other pieces of this are Apache Spark for Azure HDInsight, the Spark to Azure Cosmos DB Connector, and the Cosmos DB change feed that we mentioned in conjunction with Azure Functions.

The same idea extends to other big data applications on Azure. For example, you can connect Cosmos DB to Spark GraphX, to Kafka, to Azure Search, and to Azure Stream Analytics.

Cosmos DB applications

Microsoft has used Cosmos DB for internal applications for some time. For example, Azure Portal uses Cosmos DB as its global transactional store, as do Xbox and Skype. Microsoft shared a graph with me showing the breakdown of its internal workloads for a three-day period in 2017: Three trillion resource units (RUs) were used in 20 Azure regions, with the highest consumption being almost a trillion RUs in the Central US region.

Cosmos DB emulator

For local development on Windows, download the free Azure Cosmos DB Emulator to develop and test applications using Cosmos DB. Once your application works, you can deploy it by changing your configuration to point to an Azure Cosmos DB instance in the cloud.

At a Glance
  • If you’re a committed Azure platform user who needs a globally distributed database service, Cosmos DB is probably your most cost-effective option.

    Pros

    • Globally distributed, horizontally partitioned, multi-model database service
    • Supports four types of NoSQL data model, five API sets including a SQL dialect, and at least six programming environments
    • Automatically shards horizontally, creates and manages indexes, scales, and synchronizes regions
    • Has a range of consistency models to fulfill application requirements at least cost
    • Compatible with MongoDB, Azure Tables, and Apache TinkerPop

    Cons

    • Requires a separate account for each API used
    • Requires a separate account for each API used
    • Not compatible with the SQL used by any relational database
    • The choices of data model, API set, consistency model, and regions to support may not be obvious
1 2 Page 1
Page 1 of 2