If there's a mantra chanted by most every Hadoop vendor, it's one lifted from Henry Thoreau's quote book: "Simplify, simplify." Or maybe: Facilitate, facilitate.
With HDP 2.3, its latest Hadoop distribution, Hortonworks emphasizes ease of adoption and administration. Setting up and running Hadoop still isn't a walk in the park, but will fixing the process boost Hadoop's adoption as Hortonworks (among others) expects?
Easier is better...
In a phone conversation, Tim Hall, vice president of product management at Hortonworks, described the company's mission to make Hadoop less ornery to deploy as "getting rid of the command line." At the very least, users would need to invoke the command line something only by choice with Hadoop.
From the customer feedback Hortonworks gathered, the four items most in need of simpler configurations (the "top knobs you have to twiddle") were HDFS, YARN, the Hive real-time query system, and HBase. To that end, Hortonworks endeavored to streamline and clarify the setup process for those components.
Some of the work done with this update involves specific tools. For instance, developers now have a SQL builder for Hive that can document a given query's separation and distribution across clusters.
Hortonworks has been keeping a eye out for the myriad ways Hadoop can be made easier to deploy. Back in April, Hortonworks acquired SequenceIQ, a Budapest-based creator of Hadoop deployment tools for containers and clusters. Some are convinced that Hadoop runs best on bare metal, but Hortonworks' ambitions seem more about support for a breadth of deployment options.
Another area where Hortonworks has been trying to distinguish itself is in Hadoop's conformance to data governance procedures. Thus, Hortonworks has added Apache Atlas to HDP, an overall data governance framework where data can be searched and audited, but also retain any anonymization, data masking, or other compliance requirements. It echoes the data virtualization work done elsewhere in the industry, and it makes sense to have a single underlying (and open source) principle in Hadoop.
... but will it be right?
Hortonworks is clearly banking on the combination of elements to make Hadoop more interesting to enterprises, even as questions arise as to whether or not ease of adoption, configuration, or deployment are the real barriers.
Hall was also quick to discount the idea that pieces within Hadoop that have achieved their own fame, like Spark, are prepared to become self-contained ecosystems -- especially if Hadoop turns out to be less broadly accepted than anticipated.
"Spark is an interesting engine that runs nicely within Hadoop," said Hall, "and the power with Spark is having it work in a better and more integrated fashion with the Hadoop ecosystem, such as integrating Spark with Hbase. We're looking at how it can be better together within the platform, not as a platform in itself."
Hortonworks has the numbers to back up its own bullishness, at least, and its devotion to open source as a way of life is laudable. The next step will be to find out which has more of a future: Making Hadoop easier to work with or figuring out what enterprises really want.