Bigtable-inspired open source projects take different routes to the highly scalable, highly flexible, distributed, wide column data store
Meanwhile, though Cassandra is described as having "eventual" consistency, both read and write consistency can be tuned, not only by level, but in extent. That is, you can configure not only how many replica nodes must successfully complete the operation before it is acknowledged, but also whether the participating replica nodes span data centers.
Further, Cassandra has added lightweight transactions to its repertoire. Cassandra's lightweight transaction is a "compare and set" mechanism roughly comparable to HBase's "check and put" capability; HBase also has a "read-check-delete" operation for which Cassandra has no counterpart. Finally, Cassandra's 2.0 release added row-level write isolation: If a client updates multiple columns in a row, other clients will see either none of the updates or all of the updates.
In both Cassandra and HBase, the primary index is the row key, but data is stored on disk such that column family members are kept in close proximity to one another. It is therefore important to carefully plan the organization of column families. To keep query performance high, columns with similar access patterns should be placed in the same column family. Cassandra lets you create additional, secondary indexes on column values. This can improve data access in columns whose values have a high level of repetition -- such as a column that stores the state field of a customer's mailing address. HBase lacks built-in support for secondary indexes, but offers a number of mechanisms that provide secondary index functionality. These are described in HBase's online reference guide and on HBase community blogs.
As stated earlier, both databases have "command line" shells for issuing data manipulation commands. Both HBase's and Cassandra's shells are built on the JRuby shell, so you can write scripts that employ all of the JRuby shell's resources to interact with specific APIs provided by the databases. In addition, Cassandra has defined CQL, modeled after SQL. CQL is far richer than the query language used by HBase, and it can be executed directly in Cassandra's shell.
In fact, Cassandra is moving toward CQL as the database's primary programming interface, though Cassandra still supports the Thrift API. (Thrift is language independent, but it's now considered a legacy API.) The Cassandra documentation lists drivers for Java, C#, and Python, all of which employ CQL version 3. Finally, a JDBC driver is also available for Cassandra. It uses CQL in place of SQL as its data definition and data management languages.
HBase's native Java API provides the richest functionality to programmers, though HBase also sports the language-agnostic Thrift interface, as well as a RESTful Web service interface. While the data manipulation commands of HBase are not as rich as CQL, HBase does have a "filter" capability that executes on the server side of a session and improves scanning (search) throughput.
HBase has also introduced "coprocessors," which allow the execution of user code in the context of the HBase processes. The result is roughly comparable to the relational database world's triggers and stored procedures. Cassandra currently has no counterpart to HBase's coprocessors.
Cassandra's documentation is noticeably better than HBase's, and good documentation certainly flattens the learning curve. In my experience, setting up a development Cassandra cluster is simpler than setting up an HBase cluster. Of course, this is only important for development and testing purposes.
You may still be better off sticking with Win7 or Win8.1, given the wide range of ongoing Win10...
An unlikely combination of two Windows updates can reduce scan times from hours to minutes
With myriad problems now evident, it may be best to skip the Anniversary Update for now
Sponsored by Intel
From Docker containers and Nano Server to software-defined storage and networking improvements, Windows...
Your killer resume and impeccable credentials have landed you an interview, and here's how to nail it ...
Tired of slow joins and poky graph analytics? These database solutions use GPU acceleration for faster...
When developers and suppliers carefully list the tools used to build an application and what...