Hortonworks is bringing the popular open-source Apache Hadoop data processing platform to Microsoft shops.
The company has released a beta version of its Hortonworks Data Platform (HDP) Hadoop distribution for Windows and expects to release the final, enterprise-ready version in the months to come.
[ Also on InfoWorld: Hadoop will be in two-thirds of advanced analytics products by 2015. | Harness the power of Hadoop with InfoWorld's 7 top tools for taming big data. | Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. ]
HDP is "the first and only distribution of Hadoop available on both Linux and Windows," said David McJannet, Hortonworks vice president of marketing.
According to McJannet, Hortonworks heard a lot of demand from potential customers for a Hadoop distribution that would run on the Microsoft platform.
"The real catalyst is, frankly, market demand. The significant majority of the servers running in the enterprise today are running Windows Server," McJannet said. "We've seen significant interest from our customers towards using Hadoop on the platform that they rely on for their critical applications."
Hortonworks and Microsoft have been porting the software to Windows over the past 18 months, as well as testing the software for enterprise use, McJannet said. The HDP distribution consists of a set of different software programs -- including HDFS, MapReduce, Hive, Pig and others. Like the Linux version, the Windows HDP will be available as open source "so others can benefit and extend the work that we have done," McJannet said.
Going forward, Hortonworks will release new versions of the HDP in both Linux and Windows. This first Windows beta version is based on the HDP 1.1 codebase.
Initially, the Windows beta does not have feature parity with the Linux version, though it does have all the "core components" to run Hadoop, McJannet said. But it does not include the Ambari set of management tools. Over time, however, Hortonworks does plan to duplicate all the features on the Windows version.
Hortonworks expects that the kind of workloads run on the Windows platform will be similar to those run on Linux, in terms of size and scope. "We fully anticipate some of the largest deployments of Hadoop could well be on Windows," McJannet said.
The distribution does not support running a mixture of Windows nodes and Linux nodes in the same deployment. Deployments should be all in one OS or another. "In practice, we'd expect homogeneity across the infrastructure, though we'd have to wait and see how that pattern emerges," McJannet said.
Over time, Microsoft will provide more support in other software products, most notably System Center, for organizations that want to move Windows Hadoop workloads in between their own data centers and a Microsoft Azure cloud service, said Herain Oberoi, Microsoft director of product marketing in the company's server and tools division.
As of press time, Hortonworks hasn't finalized the versions of Windows Servers upon which HDP will run, though the beta will run on Windows Server 2008 and Windows Server 2012. The product will not run on Windows desktop versions.