Users of the Informatica's business intelligence (BI) platform can now analyze data housed by the Cloudera Distribution for Hadoop (CDH), an open-source data storage framework.
[ InfoWorld's Paul Krill says Hadoop is getting closer to being enterprise-ready. | Keep up with the latest approaches to managing information overload and staying compliant in InfoWorld's Enterprise Data Explosion newsletter. ]
The two companies are pitching this combination of technologies as a way for enterprises to pair transactional data, often kept in a data warehouse, with massive amounts of social media-related information.
"This partnership enables organizations to take advantage of a proven data integration platform and this new data management technology to process and analyze enormous amounts of data," said Mike Olson, chief executive officer, Cloudera, in a statement.
Enterprises seem to be curious about using Hadoop for business intelligence duties, particularly for data sets too large to fit in traditional data warehouses. Business Intelligence provider Pentaho and data warehouse systems vendor Teradata have also adapted their products to this technology.
The combined packages could be used to analyze customer churn and other forms of risk, to facilitate cross-selling of multiple products and to test how well advertising campaigns are working, the companies suggested.
Users will be able to move data in and out of Hadoop through Informatica's graphical environment. They can deploy the Sqoop tool to issue SQL commands against Hadoop data. Data mappings developed within Informatica can also be used against Hadoop instances, using a combination of MapReduce functions and Hadoop User Defined Functions.
One big user of Hadoop is eBay, which is building a 8,500-processor 16-petabyte Hadoop cluster that should be running by the end of the year, revealed Anil Madan, eBay's director of engineering for analytical platforms at the HadoopWorld conference, held last month in New York.
EBay will use this system for improving the site's search capability by generating more elaborate decision trees, as well as to build better models for detecting fraud, Madan said.