Cloud-hosted Hadoop as a service now does SQL

Altiscale's Hadoop-as-a-service in the cloud now has SQL querying courtesy of Apache's Hive project

Stephen Sauer

Altiscale, maker of a cloud-hosted Hadoop solution, is expanding its service to include running SQL queries, a feature of broad interest to Hadoop users.

There are already several ways to query Hadoop with SQL. Altiscale is concentrating on using the Apache Hive query engine, the first of the SQL query systems for Hadoop, in its Altiscale Data Cloud service. Along with version 0.13.1 of Hive (the most recent official release) and version 0.4.0 of the Tez framework, Altiscale offers a Web-based SQL query tool.

For those more comfortable using a third-party tool like Microsoft Excel for hands-on number-crunching, Altiscale has partnered with Simba, makers of an ODBC interface for Hadoop, to allow those tools to plug into the service.

Most of the recent transformative work with Hive has been done by Hortonworks, whose Stinger project helped juice Hive performance and roll in previously unsupported SQL functions, such as the EXISTS/NOT EXISTS keywords. One minor disadvantage with using Hive is that queries are read-only; it isn't possible to perform INSERT, UPDATE, or DELETE against existing data, only SELECT.

Altiscale says its service is a painless way to set up Hadoop and to leverage it using an organization's existing analytics tools. Nearly any on-premise approach to Hadoop takes "at least six months" to build out, according to Altiscale founder and CEO Raymie Stata. Major Hadoop vendors like Red Hat and Hortonworks seem to concur, based on their idiosyncractic approaches to that problem; in Red Hat's case, it's integrating Hadoop setup with Red Hat Enterprise Linux.

A major caveat hangs over the use of any cloud-hosted Hadoop: Hadoop works best as a way to bring processing to data rather than vice versa. A startup that's beginning its work with Hadoop and thus has little in the way of an existing data store should find it easier to adopt a cloud-based Hadoop solution, but it's typically less appealing if you're already sitting on top of thousands of terabytes of data.

Altiscale doesn't agree. Steve Kishi, vice president of product management, claims that moving data to Altiscale has not been an impediment to its customers. "Many of Altiscale’s customers have data volumes in the hundreds of terabytes and transfer terabytes of data each week," he stated in an email. "These volumes can be handled over standard networking."