Relief is on the way for users of the open source Apache Hadoop distributed computing platform who have wrestled with the complexity of the technology.
A planned upgrade to Hadoop distributed computing platform, which has become popular for analyzing large volumes of data, is intended to make the platform more user-friendly, said Eric Baldeschwieler, CEO of HortonWorks, which was unveiled as a Yahoo spinoff last month with the intent of building a support and training business around Hadoop. The upgrade also will feature improvements for high availability, installation, and data management. Due in beta releases later this year with a general availability release eyed for the second quarter of 2012, the release is probably going to be called Hadoop 0.23.
[ Explore the current trends and solutions in BI with InfoWorld's interactive Business Intelligence iGuide. | Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. | Follow Paul Krill on Twitter. ]
"A big focus for us is going to be adding tools for monitoring and distributing and management, [with the goal of making it] much easier for organizations to use Hadoop. The problem now is it takes a pretty sophisticated operations staff to install and use it," Baldeschwieler said during an interview at HortonWorks's Silicon Valley offices this week. He formerly was vice president of Hadoop engineering at Yahoo, which has been instrumental in Hadoop development.
Version 0.23 also is set for improvements in availability, performance, and scalability. "That's a big one for very large customers," such as Yahoo and Facebook, Baldeschwieler said. Tending to single points of failure in Hadoop's master nodes will be a goal.
Also, the new HCatalog data management software layer planned for Hadoop 0.23 will let users store data in a more traditional table style, enabling users to transparently move data between tools. It also yields benefits for the MapReduce programming model used with Hadoop. Currently, users can work with two higher-level languages on top of Hadoop -- Pig and Hive -- said Baldeschwieler. Pig and Hive have their own specialty data stores. "What HCatalog's going to allow is for Pig and Hive and MapReduce itself to operate on one set of tables," he said.
An Apache representative concurred that goals for Hadoop include improvements for high availability, data management, and user friendliness, but Apache would not confirm what will be in the next release or what the version number will be. Because of Hadoop's culture of continuous beta releases, there has yet to be a formal 1.0 release, Baldeschwieler said. "There will come a point where we will want to call it 1.0 or 2.0."
This article, "Apache Hadoop to get more user friendly," was originally published at InfoWorld.com. Follow the latest developments in business technology news and get a digest of the key stories each day in the InfoWorld Daily newsletter. For the latest developments in business technology news, follow InfoWorld.com on Twitter.
Read more about storage and managing enterprise data in InfoWorld's Data Explosion Channel.