Cloudera preps Hadoop for the enterprise
Cloudera expands its commercial Hadoop packages, gears up to offer the technology to enterprises as an alternative to relational databases
Cloudera has unveiled a new set of Hadoop management tools, called Cloudera Enterprise, that the company will offer for an annual subscription fee, it announced on Tuesday. It has also updated its open-source distribution package of Hadoop.
Both new releases, as well as several new partnerships with providers of data management software vendors, show the company gearing up to offer the emerging database technology -- now mostly used by Web giants like Google and Yahoo -- to the enterprise market as an alternative to relational databases.
[ Also on InfoWorld: Yahoo today added security and workflow management to Hadoop. | Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. ]
"Our bet is not only the big Web companies, but banks, hospitals, and insurance companies will discover they need to analyze complex and structured data together, and Hadoop was made for that," said Cloudera CEO Mike Olson. "Hadoop solves a new problem, in a new way."
One of a growing number of non-SQL, or NoSQL databases, Hadoop is based off of Google MapReduce, a framework for processing data in parallel across large numbers of computer nodes. Hadoop, now being developed as an open-source project by the Apache Software Foundation, offers an alternative to traditional relational databases, for at least those cases of analyzing large, quickly changing data sets.
It can work with both SQL and non-SQL data, and is more resilient to server failure than relational databases, Olson said.
Cloudera is packaging Hadoop for midlevel organizations, both with its Hadoop distribution, and its newly released set of management tools. Both packages should allow organizations without a lot of in-depth technical experience in Hadoop to run the software, Olson said. "There is this myth that Hadoop is usable if you have Google-scaled data. There are many users who have merely a few terabytes of data that they wish to analyze," Olson said.
Cloudera's Distribution for Hadoop (CDH) is an open source package of pre-integrated software programs built around the Hadoop Common, formerly named Hadoop Core. The package includes: Hive, which provides a data warehouse infrastructure; HBase, the database underlying Hadoop; Pig, a compiler for map-reduce programs; Zookeper, a scheduling for running applications across multiple servers, and MapReduce.
In the newly released version 3, the package includes three programs that the company has released as open-source projects, under the Apache V2 open-source license. One is Flume, which can assist in the loading of data into Hadoop. Another new addition is Oozie, which is a workflow management software. The last is the Hadoop User Environment (HUE) code, which provides an user interface for managing Hadoop.
"HUE allows anyone to build an applications targeted at analysts. It knows how to talk to the Hadoop clusters," Olson said.
The Cloudera Enterprise package augments CDH version 3 with additional management tools. This new software, which is not open source, allows administrators to control access management through use of the Lightweight Directory Access Protocol. Programs are also provided to provision resources, to do configuration and performance monitoring.









