Big data showdown: Cassandra vs. HBase
Bigtable-inspired open source projects take different routes to the highly scalable, highly flexible, distributed, wide column data store
The win column
The real work appears when you must tune a cluster for your particular application. Given the size of the data sets involved and the complexity of building and managing a multinode cluster (that often spans multiple data centers), tuning is hardly straightforward. It demands a solid understanding of the interplay of the cluster's memory caching, disk storage, and internode communications, and it requires careful monitoring of cluster behavior.
It's true that HBase's reliance on Zookeeper -- a separate application -- introduces an additional point of failure (and the attendant difficulties troubleshooting the source of a problem) that Cassandra avoids. But it isn't the case that tuning a Cassandra cluster is orders of magnitude less difficult. In the end, comparing the travails of cluster tuning of both databases, it's probably a wash.
Which means, as usual, there is no clear winner or loser. You'll find zealots for both databases, and each camp will present compelling evidence demonstrating the superiority of their system. And as usual, you'll face the chore of taking each for a test drive and benchmarking them against your target application. But given the scope of these technologies, could there be any other way?
|Platforms||CentOS, Red Hat, Debian, Ubuntu, Mac OS X, Windows||Requires Java SE version 6; can be run on Windows using Cygwin|
|Cost||Free, open source under the Apache License version 2.0||Free, open source under the Apache License version 2.0|
This article, "Big data showdown: Cassandra vs. HBase," was originally published at InfoWorld.com. Follow the latest developments in application development, data management, cloud computing, and open source at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.