Spark speeds up SQL on Hadoop in Splice Machine 2.0

Apache Spark is being used to enhance yet another third-party data processing product

Spark speeds up SQL on Hadoop in Splice Machine 2.0

Apache Spark rose to fame as an in-memory data processing framework frequently used with Hadoop, but it's fast transforming into a nucleus for building other data-processing products.

Newly released, version 2.0 of the SQL-RDBMS-on-Hadoop solution Splice Machine uses Spark as one of two processing engines. Incoming work is divided between them depending on whether it's an OLTP or OLAP workload.

Splice Machine originally made a name for itself as a replacement for multiterabyte workloads on conventional ACID RDBMS solutions like Oracle. The company claimed it enabled workloads for one former Oracle customer to run an order of magnitude faster, and Hadoop's native scale-out architecture meant the solution could grow with the size of workloads at a lower cost than with a conventional RDBMS.

Monte Zweben, co-founder and CEO of Splice Machine, stated in an interview that Splice Machine's big innovation is that it allows OLTP and OLAP workloads to run side by side using the same data and same architecture, but with different processing engines, making it easier to make business decisions with the data.

"We have an architecture that can identify whatever query comes into the system, determine whether it's OLTP or OLAP, and send the query to the right computational engine," Zweben said. Transactional queries are run under HBase; OLAP queries are processed via Spark. This also allows memory and CPU usage for each kind of query to be kept segregated.

Adding Spark to Splice Machine may have been been inevitable, as Mike Franklin, one of the company's advisory board members and a chair in the computer science department at UC Berkeley, is director of AMPLab, where Spark originated.

Spark's original aim was to provide data scientists with an easy way to perform the kinds of data processing that used to require a lot of code. Spark's already been used to rewrite IBM's DataWorks data-transformation product. In Splice Machine's case, though, it adds entirely new functionality rather than simply enhancing the product.

Spark notwithstanding, Splice Machine faces stiff competition in a field that is growing more crowded by the minute. The database field offers a wealth of possibilities -- NoSQL, NewSQL, and in-memory processing -- many of which are designed to satisfy extremely specific use cases at high speed. Existing database vendors like MicrosoftOracle, and Postgres are all upping their games to compete with NoSQL and in-memory DB offerings, and Hadoop vendors are spicing up their distributions to satisfy the need for fast analytics results. 

While one of Splice Machine's selling points is that it allows the reuse of existing ANSI SQL, the compatibility and speed issues with SQL-on-NoSQL solutions will become easier to surmount with time

Copyright © 2015 IDG Communications, Inc.