Informatica is joining the growing ranks of vendors moving to support Hadoop, the open-source framework for large-scale or "big data" processing, the company announced Monday.
The 9.1 version of Informatica's platform features a connector to the Hadoop file system (HDFS), allowing customers to move data in and out of Hadoop clusters.
[ Also on InfoWorld, find out what you need to know about the big promise of Big Data and see why big data is expected to get even bigger in 2011. | Get smarter about how you handle the explosion of enterprise data with InfoWorld's Enterprise Data Explosion newsletter. | And discover the key technologies to speed archival storage and get quick data recovery in InfoWorld's Archiving Deep Dive PDF special report. ]
While the Hadoop project has its roots in Web companies, having been led by Yahoo, enterprises are quickly warming up to it as well, said James Markarian, Informatica executive vice president and CTO.
The problem is that corporate IT shops may not have the right sort of in-house expertise, Markarian said.
"It's early days for Hadoop, but as it starts to get more mainstream, the kind of developers that can make use of it is really changing," he said. "Your average guy in IT, they don't know MapReduce, they don't know Hive, they don't know Pig," he said, referring to a range of Hadoop tools. "That's where we come in. You don't need to learn anything else. Take your Informatica skills and we'll bring you to Hadoop."
Informatica 9.1 also targets the growing number of information types that get lumped under the "big data" header. Along with "near-universal" connections to transactional databases like IBM DB2 and Oracle, as well as analytic-focused data platforms such as Netezza and Teradata, the new release can also pull in data from social sites like Facebook, Twitter and LinkedIn.
Other aspects of Informatica's platform, such as MDM (master data management), data quality and self-service tools, are also getting a range of updates as part of the 9.1 release.
The social media and Hadoop connectors are sold separately from the core platform, according to Markarian. Pricing was not immediately available.
It is indeed early days for Hadoop, according to Forrester Research analyst James Kobielus .
While Informatica now has the ability to load and retrieve data from Hadoop clusters, that's not necessarily different from what a number of data warehousing vendors already have, and it's likely that other data integration vendors will follow Informatica's move, he said.
Overall, effective use of Hadoop is not about one tool, according to Kobielus. Early adopters should look to standardize their activities on a core stack of technologies, which hasn't emerged yet, he said. So far, it seems like the only common element in Hadoop projects is the use of MapReduce for the modeling layer, he said.
"I would like to see Informatica and other data integration vendors offer rich IDEs (integrated development environments) for [Hadoop]," Kobielus said. "I strongly expect they will do that."
Chris Kanaracus covers enterprise software and general technology breaking news for The IDG News Service. Chris's email address is Chris_Kanaracus@idg.com.