Datastax Enterprise is the commercial distribution of Apache Cassandra, a column-family NoSQL database developed by Facebook and probably best known for powering Netflix. The new 4.5 release of DataStax Enterprise, announced June 30, advances DataStax's case that NoSQL is ready for enterprise applications. It features Apache Spark integration for fast in-memory analytics, Hortonworks and Cloudera integration for easy access to Hadoop data, and new diagnostic and security tools.
Probably the most visible new feature in DSE 4.5 is integration with Spark, which enables execution of advanced data analytics in memory. This is a significant change from MapReduce, which requires that all intermediate and final results be written to disk. That's how Spark can claim speeds up to 100 times faster than MapReduce for the same computation. By running Spark on top of Apache Cassandra, DataStax Enterprise 4.5 becomes the first platform to offer users the ability to perform computations on Cassandra data in near-real time.
[ See how Test Center rates Cassandra in the InfoWorld review. | Work smarter, not harder -- download the Developers' Survival Guide from InfoWorld for all the tips and trends programmers need to know. | Keep up with the latest developer news with InfoWorld's Developer World newsletter. ]
DataStax has also bundled Apache Shark in the new release. Shark gives users the ability to run Hive queries using the Spark engine. Now DataStax users who have been running batch analytics in Hive will be able to run those jobs quickly in memory without needing to port their HiveQL code to Cassandra Query Language.
Spark integration continues efforts by DataStax to increase speed and performance. With the 4.0 release in February, DataStax introduced the ability to run transactional workloads on Cassandra data in memory. Now with DSE 4.5, users will be able to leverage both in-memory features to run their entire workloads, transactional and analytical, in memory. This opens up the ability to run fast read/writes and fast analytics over a unified database -- up to now out of reach in the Hadoop ecosystem.
DataStax Enterprise 4.5 offers many other new features beyond Spark integration. As part of the release, DataStax has announced official partnerships with commercial Hadoop distributors Hortonworks and Cloudera. This means that DataStax customers will now be able to query across their DataStax database and Hadoop installation simultaneously. According to Schumacher, "customers can now run a Hive query on our platform that joins together a Cassandra table and then an external Cloudera [or Hortonworks] Hive table in the same query."
Furthermore, users can then store the results in their DataStax database or remotely in their commercial Hadoop installation. This is a significant development for customers needing a way to integrate hot data in a DataStax Cassandra database with historical data stored in a commercial Hadoop installation.
Formally, DataStax will support such integration for only the current and most recent prior versions of Hortonworks and Cloudera Hadoop. If you're willing to forgo the benefits of formal support, however, you'll also be able to integrate your DataStax Enterprise installation with an open source Hadoop installation.
DSE 4.5 offers several other perks as well. The version of OpsCenter, DataStax's visual cluster management interface (which now ships with the release) supports clusters up to 1,000 nodes. The new release also comes with improved diagnostic and security tools.