Hadoop is coming out of the shadows and into production in IT shops that are drawn to its ability to store, process and analyze extremely large volumes of data. But the relative newness of the open-source platform and a shortage of experienced Hadoop talent pose technical challenges that enterprise IT teams need to address.
Hadoop grew out of the work of Doug Cutting and Mike Cafarella, who originally developed it to support Apache Nutch, an open-source search engine. It became an Apache project when Cutting and a team of engineers at Yahoo split the distributed computing code out of the Nutch crawler to create Hadoop.
[ Also read "Enterprise Hadoop: Big data processing made easier." | Explore the current trends and solutions in BI with InfoWorld's interactive Business Intelligence iGuide. | Read about InfoWorld's 2012 Technology of the Year Award winners. | Read about InfoWorld's top 10 emerging enterprise technologies. | Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. ]
Today Hadoop powers every click at Yahoo, where the Hadoop production environment spans more than 42,000 nodes. That kind of scalability is a sweet spot of Hadoop, which is designed to handle data-intensive distributed applications spanning thousands of nodes and exabytes of data, with a high degree of fault tolerance.
Hadoop pioneers in the online world -- including eBay, Facebook, LinkedIn, Netflix, and Twitter -- paved the way for companies in other data-intensive industries such as finance, technology, telecom and government. Increasingly, IT shops are finding a place for Hadoop in their data architecture plans. The appeal, in a nutshell, is that Hadoop can enable massively parallel computing on inexpensive commodity servers. Companies can collect more data, retain it longer, and perform analyses that weren't practical in the past because of cost, complexity and a lack of tools.
At Concurrent Computer, the decision to use Hadoop was driven in large part by volume.
"Scalability was the biggest concern. With a traditional relational database, every time you want to scale or get bigger, you end up paying a premium," says Will Lazzaro, director of engineering at Concurrent, which provides video-on-demand systems and processes billions of records a day related to viewers, content consumption and platform operations.