DataStax CEO: Let's clear the air about NoSQL and ACID

Many still believe NoSQL databases can't play at same level as relational forbearers. DataStax CEO Billy Bosworth puts the notion to rest

This week on the New Tech Forum, DataStax CEO Billy Bosworth offers an incisive essay on a key difference in relational and NoSQL mindsets. Bosworth has spent many years in the database and development world, and has become a passionate proponent of NoSQL databases such as Apache's Cassandra, the open source NoSQL database around which his company is built.

Bosworth zeroes in on the misinformation and misunderstanding surrounding NoSQL, particularly in regard to ACID (atomicity, consistency, isolation, durability) compliance. To address that issue, he has compiled a list of common misconceptions about ACID compliance, databases, and modern application development.

Top 5 misconceptions about ACID compliance in a nonrelational world
I graduated college in 1992, fresh computer science degree in hand, about to spend the next 10 years writing client/server applications. Oracle and SQL Server were my tools of choice, and I quickly learned the basics of relational databases. First was 3NF data modeling, originally defined by the legendary E.F. Codd. Next came an understanding of ACID properties, which lie at the heart of predictable relational database behavior.

For over 20 years, DBAs like myself wrote millions of applications according to the unbreakable laws of 3NF and ACID. Without realizing it, we came to believe that only the relational way of thinking made sense.

Then the world changed, and the demands placed on applications shifted. For online applications, two things matter more than just about anything else: performance at extreme scale and applications always staying available in a completely connected world.

I first encountered the new trend while reading an article titled "Your Coffee Shop Doesn't Use Two-Phase Commit." Nonrelational concepts were entering mainstream thinking. Nonetheless, I continued believing that certain use cases were far too taboo when it came to breaking the relational laws, the quintessential example being the "ATM problem." Surely, critical applications still required locks and complex transactions with rollback/commit capabilities? Eric Brewer has since shattered even that belief, and now it seems that virtually any use case is up for grabs.

All this requires new ways of thinking. With performance and availability emerging as paramount considerations, you need to make some changes in how you deal with such matters as the consistency of your data and how you build the data model itself. One way to enable this paradigm shift is to address top five misconceptions that relational-minded developers and DBAs believe about ACID compliance for modern applications.

The confusion often starts when I call a postrelational database such as Apache Cassandra "transactional" in nature. One of the first questions I get is: "Oh, is it ACID compliant?" Or a common variant of the question: "But Cassandra is eventually consistent, so it can't be ACID compliant, right?" Both betray a misunderstanding of how modern databases are solving the challenges around online applications.

Misconception No. 1: You can't build an online application without ACID compliance
This misconception is flat out wrong and largely stems from built-in biases that we RDBMS folks have developed over the past two decades. Fortunately, you can easily find hundreds of companies such as eBay, Instagram, and Netflix building mission-critical online applications without full ACID compliance.

Misconception No. 2: ACID is an all-or-nothing proposition
Many people forget that ACID is an acronym representing four distinct characteristics: atomicity, consistency, isolation, and durability. In today's world of online applications, developers and architects make trade-offs to serve the greatest need.

Modern NoSQL databases offer various pieces of ACID that serve the needs of a given application just fine. Postrelational technologies often sacrifice consistency for performance reasons, while following (at least partially) the "AID" aspects of ACID. That's what I mean when I say it's not an all-or-nothing proposition. Parts of ACID may still remain relevant for your application, so you optimize for those accordingly.

Misconception No. 3: Eventual consistency violates the "C" in "ACID"
A few years ago, I wrote a blog post to address this misconception more thoroughly, but the gist is that for many DBAs, the word "consistency" has two very different meanings. "Consistency" in ACID refers to the enforcement of constraints or rules for a given entry in a database. When we talk about eventual consistency in a database such as Cassandra, we mean something entirely different -- namely, the temporal accuracy of the data itself.

In a distributed system, the same piece of data is usually replicated to multiple machines. When you update that piece of information on one of the machines, it may take some time (usually milliseconds) to reach every machine that holds the replicated data. This creates the possibility that you might get information that hasn't yet updated on the replica. In the old relational world, this conundrum comes very close to what we call a "dirty read."

This approach can present its own set of challenges at times, but the point is that eventual consistency is a different issue than the consistency definition you find in ACID. In order for developers to appropriately manage this new dynamic, it's important that they not confuse the two definitions.

Misconception No. 4: Databases and applications have a 1:1 relationship, so it's either/or between relational and NoSQL technologies
Years ago when we wrote applications at much smaller scale, life was easy: We picked a database, wrote our application, and were done. In fact, a single database system such as an Oracle instance would often house multiple schemas that represented different applications.

Today's applications are much more sophisticated. The idea of running a single database technology for a datastore is passé. A few years ago, Martin Fowler highlighted a trend called "polyglot persistence" that is now the normative architecture for modern applications. Virtually every customer I talk to houses a services layer between multiple database technologies and the app they power. This trend has established itself firmly in the real world. Polyglot persistence allows us to use the right technology for the right workload inside an application, which can pay huge dividends and enable functionalities not possible in a 1:1 schema.

Misconception No. 5: NoSQL databases are for "Web scale" applications only; everything else uses ACID-compliant technology
Some people believe that only a niche subset of applications require scalable, distributed databases. If you stop and think about it, how many developers today build online applications that don't keep "Web scale" in mind? It's like saying, "Only a small number of cars require the speeds necessary for highway travel."

That may have been true when only a few highways existed, but now they are part of everyday travel. The same is true for Web-scale applications. Given the always-connected nature of endpoints -- whether they are users, devices, or sensors -- what other kind of online application would you write? When are performance and availability not a high priority?

Why it matters
Over a decade has passed since Eric Brewer presented his now famous Brewer's Conjecture, which laypeople ultimately learned as the CAP theorem. It got us all thinking that the ability to scale out -- represented by "Partition tolerance" (the "P" in "CAP") -- might be important enough to abandon something as sacrosanct as data consistency within an application. He was right. Today, two attributes reign supreme for online applications: availability and performance. Without both, it's nearly impossible to satisfy the insatiable flow of data and unyielding demands from users.

Eric Brewer has continued to refine his understanding of the CAP theorem in light of new technologies. At the same time, developers, architects, and database administrators are advancing their understandings of such trade-offs. At least, they're realizing we don't live in a one-size-fits-all world and must employ the right technologies for the right job. Those who free their minds of relational misconceptions will harness the power and opportunity provided by this new world of diverse database technologies.

New Tech Forum provides a means to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all enquiries to newtechforum@infoworld.com.

This article, "DataStax CEO: Let's clear the air about NoSQL and ACID," was originally published at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

Recommended
Join the discussion
Be the first to comment on this article. Our Commenting Policies