Hortonworks buys better Hadoop data flow management

Hortonworks' newest acquisition is a prelude to creating an open-source-based data flow management product

Hortonworks buys better Hadoop data flow management

Hadoop vendor Hortonworks, fresh off releasing a new version of its distribution, has acquired a company with a framework Hortonworks wants for handling how data moves into, out of, and next to Hadoop.

The company is Onyara, and the framework (of which Onyara is a commercial supporter) is the Apache NiFi project, a system for graphically diagramming how data can move through a system.

Hortonworks sees NiFi as a way to create a new data platform for Hadoop that deals with data gathered in and acted on in real time from a panoply of devices, "smart" and otherwise. Originally a product of the NSA, NiFi was open-sourced under the agency's Technology Transfer Program, the same declassification effort that provided the SIMP cyber security tool.

Apache NiFi flow Apache Foundation

An example data flow created in Apache NiFi.

Rather than trying to build the functionality into Hadoop directly, Hortonworks is creating a parallel product offering, Hortonworks DataFlow (not to be confused with Google's product of the same name). DataFlow will be sold to enterprises looking for a solution to handle data in motion as well as data at rest.

NiFi is also meant to play well with all the other stars of the Hadoop cast, like Spark (for real-time data processing) and Kafka (messaging). Plans are on the table for integrating NiFi-controlled flows into Hortonworks's existing Data Governance Initiative as well, so DataFlow-controlled data can be labeled and tagged even apart from Hadoop itself.

Hortonworks DataFlow powered by Apache NiFi Hortonworks

Hortonworks DataFlow is intended to work in parallel with Hadoop, rather than inside it, with data in motion sent to (and extracted from) Hadoop as needed.

Adding NiFi to the Hortonworks mix complements Hortonworks's central mission, which is to provide Hadoop and related products without proprietary encumbrances. But all signs point to pure open source plays of any kind as increasingly tough sledding.

Hortonworks' recent financial news has been mixed, with net losses up despite a growing customer base and increasing quarterly revenue. DataFlow comes off as an attempt to give the company a new revenue stream by leveraging existing customers instead of adding new ones. With the size of the market for commercial Hadoop offerings in question, the former approach seems smarter.

Copyright © 2015 IDG Communications, Inc.