Hadoop is all the rage, it seems. With more than 150 enterprises of various sizes using it -- including major companies such as JP Morgan Chase, Google and Yahoo -- it may seem inevitable that the open-source big data management system will land in your shop, too.
But before rushing in, make sure you know what you're signing up for. Using Hadoop requires training and a level of analytics expertise that not all companies have quite yet, customers and industry analysts say. And it's still a very young market; a number of Hadoop vendors are duking it out with various implementations, including cloud-based.
[ Also on InfoWorld: Hadoop wins over enterprise IT, spurs talent crunch. |Also read "Enterprise Hadoop: Big data processing made easier." | Explore the current trends and solutions in BI with InfoWorld's interactive Business Intelligence iGuide. | Read about InfoWorld's 2012 Technology of the Year Award winners. | Read about InfoWorld's top 10 emerging enterprise technologies. | Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. ]
Most important, perhaps: Don't buy into the hype. Forrester Research analyst James Kobielus points out that only 1 percent of U.S. enterprises are using Hadoop in production environments. "That will double or triple in the coming year," he expects, but caution is still called for, as with any up-and-coming technology.
To be sure, Hadoop has advantages over traditional database management systems, especially the ability to handle both structured data like that found in relational databases, say, as well as unstructured information such as video -- and lots of it. The system can also scale up with a minimum of fuss and bother. eBay, the online global marketplace, has 9 petabytes of both structured data on clusters from Terabyte as well as unstructured data on Hadoop-based clusters running on "thousands" of nodes, according to Hugh Williams, vice president of experience, search and platforms for the company.
"Hadoop has really changed the landscape for us," he says.
"You can run lots of different jobs of different types on the same hardware. The world pre-Hadoop was fairly inflexible that way," Williams explains. "You can make full use of a cluster in a way that's different from the way the last user used it. It allows you to create innovation with very little barrier to entry. That's pretty powerful."