Big data showdown: Cassandra vs. HBase

Bigtable-inspired open source projects take different routes to the highly scalable, highly flexible, distributed, wide column data store

Page 4 of 4

The win column
The real work appears when you must tune a cluster for your particular application. Given the size of the data sets involved and the complexity of building and managing a multinode cluster (that often spans multiple data centers), tuning is hardly straightforward. It demands a solid understanding of the interplay of the cluster's memory caching, disk storage, and internode communications, and it requires careful monitoring of cluster behavior.

It's true that HBase's reliance on Zookeeper -- a separate application -- introduces an additional point of failure (and the attendant difficulties troubleshooting the source of a problem) that Cassandra avoids. But it isn't the case that tuning a Cassandra cluster is orders of magnitude less difficult. In the end, comparing the travails of cluster tuning of both databases, it's probably a wash.

Which means, as usual, there is no clear winner or loser. You'll find zealots for both databases, and each camp will present compelling evidence demonstrating the superiority of their system. And as usual, you'll face the chore of taking each for a test drive and benchmarking them against your target application. But given the scope of these technologies, could there be any other way?

Cassandra vs. HBase at a glance

  • Symmetric architecture makes it relatively easy to create and scale large clusters
  • SQL-like Cassandra Query Language eases developers' transition from RDBMS
  • Allows you to tune for performance or consistency or a balance of both
  • Community edition of management GUI available
  • Good documentation (provided by Datastax)
  • Built-in versioning
  • Strong consistency at the record level
  • Provides RDBMS-like triggers and stored procedures through coprocessors
  • Built on tried-and-true Hadoop technologies
  • Active development community
  • Configuration is complex
  • Current trigger/stored procedure mechanism experimental
  • Management GUI difficult to get up and running
  • Lacks friendly, SQL-like query language
  • Lots of moving parts
  • Setup beyond a single-node development cluster can be difficult
PlatformsCentOS, Red Hat, Debian, Ubuntu, Mac OS X, Windows Requires Java SE version 6; can be run on Windows using Cygwin
CostFree, open source under the Apache License version 2.0 Free, open source under the Apache License version 2.0

This article, "Big data showdown: Cassandra vs. HBase," was originally published at Follow the latest developments in application development, data management, cloud computing, and open source at For the latest business technology news, follow on Twitter.

| 1 2 3 4 Page 4
From CIO: 8 Free Online Courses to Grow Your Tech Skills
View Comments
Join the discussion
Be the first to comment on this article. Our Commenting Policies