Big data showdown: Cassandra vs. HBase

Bigtable-inspired open source projects take different routes to the highly scalable, highly flexible, distributed, wide column data store

Page 4 of 4

The win column
The real work appears when you must tune a cluster for your particular application. Given the size of the data sets involved and the complexity of building and managing a multinode cluster (that often spans multiple data centers), tuning is hardly straightforward. It demands a solid understanding of the interplay of the cluster's memory caching, disk storage, and internode communications, and it requires careful monitoring of cluster behavior.

It's true that HBase's reliance on Zookeeper -- a separate application -- introduces an additional point of failure (and the attendant difficulties troubleshooting the source of a problem) that Cassandra avoids. But it isn't the case that tuning a Cassandra cluster is orders of magnitude less difficult. In the end, comparing the travails of cluster tuning of both databases, it's probably a wash.

Which means, as usual, there is no clear winner or loser. You'll find zealots for both databases, and each camp will present compelling evidence demonstrating the superiority of their system. And as usual, you'll face the chore of taking each for a test drive and benchmarking them against your target application. But given the scope of these technologies, could there be any other way?

Cassandra vs. HBase at a glance

 
Pros
  • Symmetric architecture makes it relatively easy to create and scale large clusters
  • SQL-like Cassandra Query Language eases developers' transition from RDBMS
  • Allows you to tune for performance or consistency or a balance of both
  • Community edition of management GUI available
  • Good documentation (provided by Datastax)
  • Built-in versioning
  • Strong consistency at the record level
  • Provides RDBMS-like triggers and stored procedures through coprocessors
  • Built on tried-and-true Hadoop technologies
  • Active development community
Cons
  • Configuration is complex
  • Current trigger/stored procedure mechanism experimental
  • Management GUI difficult to get up and running
  • Lacks friendly, SQL-like query language
  • Lots of moving parts
  • Setup beyond a single-node development cluster can be difficult
PlatformsCentOS, Red Hat, Debian, Ubuntu, Mac OS X, Windows Requires Java SE version 6; can be run on Windows using Cygwin
CostFree, open source under the Apache License version 2.0 Free, open source under the Apache License version 2.0

This article, "Big data showdown: Cassandra vs. HBase," was originally published at InfoWorld.com. Follow the latest developments in application development, data management, cloud computing, and open source at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

To comment on this article and other InfoWorld content, visit InfoWorld's LinkedIn page, Facebook page and Twitter stream.
| 1 2 3 4 Page 4
From CIO: 8 Free Online Courses to Grow Your Tech Skills
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.