Pivotal, an EMC/VMware spin-off that has big plans to deliver big data analytics through platform as a service, has whisked the drapes off Pivotal HD 2.0, its commercially supported enterprise-grade distribution of Hadoop.
But Pivotal's ambitions for HD don't simply involve delivering Hadoop as a free-form building block, albeit one that's professionally supported. Rather, HD is intended to be the data fabric of the company's own Pivotal One, a PaaS offering where companies can develop apps that siphon data in real time from a variety of sources and transform them into actionable information.
HD 2.0 is built on top of Apache Hadoop 2.2, but adds a good deal of proprietary technology -- a move that will likely leave open source purists wincing -- to make Hadoop the substrate of what Pivotal calls a "business data lake" architecture. One of those proprietary pieces is HAWQ, a SQL query engine designed to perform parallelized queries on data stored in HDFS; another component, GemFire XD, is an in-memory database service designed more for processing of incoming data in real time, as opposed to long-running SQL queries. HD 2.0 also includes GraphLab, a graph analytics algorithms package, and tools to allow programmers using R, Python, and Java to "enable business logic and procedures otherwise cumbersome with SQL."
Other distributions have done little more than package up Hadoop for easier delivery and provisioning under the assumption that the deploying parties would know best how to make the most of it -- an attitude that's persisted with Red Hat and Hortonworks joining forces for the sake of supporting Hadoop in Red Hat Enterprise Linux. There, the application and data-access sides have largely consisted of the likes of Red Hat's JBoss data layer. Enterprise developers still have to fit many more of the pieces together themselves.
Pivotal, on the other hand, is using Hadoop as an underlying stratum on which to build its PaaS. To that end, Pivotal One is meant to be directly useful to enterprises needing big data analytics by allowing them to leverage more of the data-access paradigms they're already familiar with (such as SQL) instead of forcing them to scrap everything and learn the Hadoop way. Again, Hadoop purists aren't going to be happy with this news, but Pivotal most wants to satisfy its enterprise customers with big data needs.
When InfoWorld's Eric Knorr pondered the launch of Pivotal back in April 2013, he considered the possibility that Pivotal One was being built as much for Pivotal itself as it was anyone else -- that Pivotal Labs (one of the acquisitions used to form Pivotal) would be "developing the bulk of those next-gen big data applications on Pivotal One for its enterprise customers, rather than enterprises developers using Pivotal One themselves."
The long-term vision, as Knorr found in his discussion later in 2013 with CEO Paul Maritz, involves not just having the ability to generate a given insight with a large data set or even to run arbitrarily large software on top of it. Rather, as he put it, "It's about how you use that in the context of some application that's going to drive a transaction or cause some interaction with the user.... We're not just in the big data business. We're in the applications and data business."
Much of what has held back Hadoop is its status as a technology rather than a product -- as cited by Facebook analytics chief Ken Rudin when talking about how "big data is about business needs." If Pivotal One and Pivotal HD make Hadoop into the kind of useful and even transformative business product that Red Hat was able to craft from Linux, odds are it'll be at least a big a win for Pivotal -- maybe even bigger than it will be for Hadoop.
This story, "Pivotal juices Hadoop with in-memory database and SQL querying," was originally published at InfoWorld.com. Get the first word on what the important tech news really means with the InfoWorld Tech Watch blog. For the latest developments in business technology news, follow InfoWorld.com on Twitter.