VMware today announced advancements that will allow vSphere to manage Hadoop clusters.
In doing so, it gives the hundreds of thousands of VMware enterprise customers a way to work with Hadoop deployments within software they are already familiar with. On the technical side, it advances work the company has made in the area of running Hadoop nodes on virtualized infrastructure, bringing the benefits of virtualization to the big data platform.
[ Harness the power of Hadoop with InfoWorld's 7 top tools for taming big data. | Doing server virtualization right is not so simple. InfoWorld's expert contributors show you how to get it right in this 24-page "Server Virtualization Deep Dive" PDF guide. | Track the latest trends in virtualization in InfoWorld's Virtualization Report newsletter. ]
[MORE HADOOP: Get ready for a flood of new Hadoop apps]
The company today announced a public beta of VMware vSphere Big Data Extensions, which will let the company's popular infrastructure management software control Hadoop clusters that customers set up. The extensions still require an underlying Hadoop platform, which vendors like HortonWorks, MapR, Cloudera, or VMware's partner Pivotal each distribute based on the open source Apache code. The big data extensions now allow those distributions to be managed through vSphere. "VMware's enabling private enterprises to host their own big data as a service," says Michael Matchett, a senior analyst at the Taneja Group.
VMware has enabled the features though its work on Project Serengeti, which has been aimed at optimizing Hadoop clusters to run on virtualized infrastructure. Matchett says that's a potentially significant move for the big data project and especially companies deploying it. Running Hadoop nodes on virtual machines instead of bare metal brings many of the same advantages as virtualizing compute servers: More efficient use of hardware resources and additional flexibility in managing the system. "You can come out ahead hosting Hadoop in a virtual environment because it gives you the ability to mix in other workloads and take full advantage of the infrastructure across multiple clients," Matchett says.
Other companies have also done work to virtualize Hadoop clusters. Amazon Web Services has its Elastic Map Reduce (EMR) offering, which is basically a Hadoop-like public-cloud based service. VMware is targeting private cloud and on-customer premises deployments though.
Adding support for vSphere could also foreshadow other moves VMware plans to make. For example, VMware could extend the platform to allow for easy migrations of Hadoop workloads managed by vSphere with the company's upcoming release of its public cloud offering, which is set to be released later this year. Other companies, particularly Microsoft, could be next in line to roll out support for their management software to control Hadoop distributions on that company's hypervisor, Hyper-V in Windows Systems Center.
VMware announced the new features as part of a public beta that customers can sign up for this week using vSphere 5.1; it expects the functionality to be generally available by the end of the year. In addition to announcing the extensions, VMware also said Project Serengeti supports the latest open source code from Apache Hadoop, including the new YARN feature, a resource manager that some in the Hadoop community believe could open the floodgates for new applications to be built on top of the Hadoop platform.
Network World senior writer Brandon Butler covers cloud computing and social collaboration. He can be reached at BButler@nww.com and found on Twitter at @BButlerNWW.
Read more about data center in Network World's Data Center section.
This story, "Big virtualization: VMware is virtualizing Hadoop" was originally published by Network World.