Corks popped earlier this week in the halls of the Apache Foundation, as its Apache HBase project finally reached 1.0 status after more than four years of work.
HBase, sometimes called "the Hadoop database," is a distributed, column-oriented key-value store along the lines of Cassandra. Databases consisting of billions of rows and columns can be stored in HBase and retrieved via conventional SQL queries, and an HBase database can scale out by simply adding nodes to an existing cluster. While HBase has been a key component of Hadoop, it has become its own entity in the NoSQL world, though it faces stiff competition.
In a blog post, Enis Söztutar, Apache HBase PMC member and release manager for HBase 1.0, laid out the reasons for why it had taken so long to reach 1.0 since the 2011 release of version 0.90. For the project to be labeled 1.0, it had to meet three goals:
- Be a stable foundation for future 1.x releases
- Stabilize the way HBase clusters work
- Clean up the client API so that it's easier to maintain going forward
The biggest changes for HBase users and developers, as hinted above, revolve around the public API set, with some APIs on track to be deprecated and others being newly introduced. (All APIs deprecated in HBase 1.x are to be removed completely in 2.x.)
Many other changes under the hood have also been rolled in as part of the past year's testing and refinement. One major example is a new "read availability using timeline consistent region replicas" feature, which allows some regions in an HBase cluster to be set primarily for reads to satisfy the needs of low-latency applications. Such applications are becoming big drivers of innovation in big-data circles; Couchbase, another NoSQL solution that can write to HDFS as HBase does, is good at storing high volumes of data with low latencies.
Since HBase and Hadoop are closely linked, especially via the HDFS project, changes to HBase are likely to spur near-term releases from Hadoop vendors as they upgrade their products to include the new HBase.
HBase is showing greater ambitions outside of the context of Hadoop. In early 2011, HBase became its own stand-alone top-level Apache project and started going toe-to-toe with two of the big names in NoSQL databases: the aforementioned Cassandra and the immensely popular MongoDB.
Against those two, HBase faces tough competition as its power and capacity are offset by its complexity of installation and use. When compared against Cassandra in a recent InfoWorld showdown, Cassandra claimed the edge because it's easier to set up, manage, and learn. Apache is apparently aware of some of these issues, as its documentation (one source of criticism) was "radically revamped" (per Apache) for the 1.0 release.