Controversial unified-Hadoop project bears first fruit

Hortonworks and Pivotal's Open Data Platform project yields its first tangible results, but they're unlikely to dispel controversy over ODP's mission

fruit tree
Sandy Austin

The ODP (Open Data Platform), a consortium of vendors attempting to produce a base Hadoop distribution, is bearing its first real fruit -- but it's unlikely to satisfy critics.

Two of the key figures in ODP, Pivotal and Hortonworks, have collaborated to certify Pivotal's Hawq, an SQL interface for Hadoop data, on the HDP (Hortonworks Data Platform) distribution of Hadoop.

Hawq was, and is, a key component of Pivotal's Big Data Suite, a set of components for Hadoop that Pivotal had previously only made available as a proprietary product. Earlier this year, Pivotal relented and open-sourced the Big Data Suite components -- Hawq, the Gemfire NoSQL database, and the Greenplum analytics systems -- under various licenses.

Now, Pivotal is ensuring those pieces will work as-is with HDP and claims this "moves away from a proprietary management and configuration framework to an open source, Hadoop-native environment," with a lower TCO as a benefit.

Pivotal is pitching Hawq at enterprises investing in HDP, but also interested in "a strong SQL engine to build their analytics use cases, offload tasks from traditional Enterprise Data Warehouses and execute them at Hadoop scale."

Satisfying enterprises is only part of the picture with this collaboration. It's also meant to serve as an example use case of the ODP in action.

Since all ODP-based Hadoop distributions share a common set of underlying components, it's theoretically easier for those building on top of the Hadoop platform to extend on it in a way that's not restricted by the behaviors of any one Hadoop distribution.

But is the ODP needed to make such events happen? It remains a tough question, even if it's being raised mainly by competitors with no small interest in wanting to supply customers with their own, sometimes proprietary, solutions.

Since its unveiling, the ODP has inspired at least as much dissension and controversy as it has acquired adherents and contributors. Cloudera, for instance, has elected not to participate. And MapR -- makers of a less name-brand but still-significant Hadoop distribution -- has decided not to join. In MapR's eyes, the existing governance of Hadoop by the Apache Software Foundation makes efforts like the ODP redundant, and the "core" as defined by the ODP is "vendor-biased."

One of those Apache-sponsored core projects, Ambari, is a configuration management system for Hadoop components that Hawq itself integrates with. Ambari claims that MapR, sponsored mainly by Pivotal and Hortonworks, is "used by less than 25 percent of the market," and thus doesn't present a good case for being a core offering. (MapR integrates with other projects, like Juju and ZooKeeper, for doing configuration management, but has a proprietary HDFS replacement, MapR-FS, as one of its offerings.)

"Project and subproject interoperability [in Hadoop] are very good and guaranteed by both free and paid-for distributions," MapR said in its blog post on the subject. "Applications built on one distribution can be migrated with virtually zero switching costs to the other distributions."


Copyright © 2015 IDG Communications, Inc.