Also, Kobielus explains, best practices with Hadoop are still evolving, so it's best to figure out some short-term benefit you might get from the system and avoid anything too long-term to start. As you build up expertise, you can figure out more things to do with the software. In the meantime, the range of approaches that early adopters are using to build out and scale their clusters "are all over the board," he says.
Adds to, doesn't replace, other databases
Most customers are using Hadoop in addition to, not instead of, other types of software. At eBay, for instance, the company still uses relational databases as well as does "a lot of custom [database] work," Williams explains. "At eBay, we see value in using multiple technologies to work with our data. Hadoop is a terrific choice for certain uses, while other technologies work alongside it for other purposes."
For example, when it comes to transactions, "it makes total sense to use a relational database system," he says. But overall the idea is to remain "flexible in what technologies we use at eBay; we don't see a world where there will be one unifying technology."
The same is true at Concurrent. Hadoop hasn't replaced the company's use of traditional relational databases, including MySQL, PostgreSQL and Oracle. "It is a combined solution," Lazzaro says. "We use Hadoop to do the heavy lifting, such as large-scale data processing. We then use Map/Reduce within Hadoop to create summary data that is easily accessible through a traditional RDBMS."
What tends to happen in relational databases, he explains, is that when the system gets too large -- to, say, 250 million records a day -- the database becomes "non-responsive to data queries." "However," he says, "Hadoop at that scale is not even breaking a sweat. Hadoop therefore can store, say, 5 billion records and with Map/Reduce we can create a summary of that data and insert it into a standard RDBMS for quick access."
In general, Williams says, "I don't think too much" about Hadoop's limitations. "I think about the opportunities. You can find solutions to any problems pretty quickly" through the open source community. "Some people do gripe about different aspects of Hadoop, but it's a reasonably new thing. It's like Linux was back in 1993 or 1994."
"We do see unique technology challenges at our scale and with our extreme data," Williams explains, among them architecting data centers, designing a network to support Hadoop and choosing the right hardware.
Overall, Hadoop has been a good strategy for eBay, Williams says. "For us it's been an absolute game changer. It's what our engineers want to use and it's really helped us become a really data-driven company."
Enterprise Hadoop vendors
The free open source application, Apache Hadoop, is available for enterprise IT departments to download, use and change however they wish.
But for many business users, the need for support and technical expertise often largely overshadows the lure of free do-it-yourself applications, especially when there are critical IT systems at stake.
That's where supported, enterprise-ready versions of Hadoop can instead be a better, more realistic option.
Here is a sampling of some of the major commercial vendors that can help your company get started with Hadoop. Some offer on-premises software packages; others sell Hadoop in the cloud. There are also some Hadoop database appliances beginning to appear, including the recently announced .