With Hadoop World NYC just around the corner on Oct. 2, 2009, I thought I'd share two pieces of news.
First, I've received a 25 percent discount code for readers thinking about attending Hadoop World. Hurry because the code expires on Sept. 21.
[ Stay up to speed with the open source community via InfoWorld's Technology: Open Source newsletter. ]
Second, check out this Q&A with New York Times software engineer and Hadoop user, Derek Gottfrid. Derek's doing some very cool work with Hadoop and will be presenting at Hadoop World.
Open Sources: What got you interested in Hadoop initially and how long have you been using Hadoop?
Gottfrid: I've been working with Hadoop for the last three years. Back in 2007, the New York Times decided to make all the public domain articles from 1851-1922 available free of charge in the form of images scanned from the original paper. That's 11 million articles available as images in PDF format. The code to generate the PDFs was fairly straightforward, but to get it to run in parallel across multiple machines was an issue. As I wrote about in detail back then, I came across the MapReduce paper from Google. That, coupled with what I had learned about Hadoop, got me started on the road to tackle this huge data challenge.
Open Sources: How do you use Hadoop at the New York Times and why has it been the best solution for what you're trying to accomplish?
Gottfrid: We continue to use Hadoop as a one-time batch process for tremendous volumes of image data at the New York Times. We've also moved up the food chain and use Hadoop for traditional text analytics and Web mining. It's the most cost-effective solution for processing and analyzing large sets of data, such as user logs.
Open Sources: How would you like to see Hadoop evolve? Or what are the three features you'd most like to see in Hadoop?
Gottfrid: I'd like to see the Hadoop road map clarified, as well as the individual subprojects to get rid of some of the weird interdependencies so that we can get to a legitimate 1.0 release that solidifies the APIs.
Open Sources: What can attendees expect learn about Hadoop from your preso at Hadoop World?
Get the independent advice and expertise you need to support a virtual workforce.
The increase in Linux popularity has increased the frequency and sophistication of malware attacks. Read this 2 page white paper now to learn how you can protect your Linux environment with real-time protection that is certified by all major Linux vendors.
Download now »Ensuring acceptable application delivery will become even more difficult over the next few years. As a result, IT organizations need to ensure that the approach that they take to resolving the current application delivery challenges can scale to support the emerging challenges. This handbook elaborates on the key tasks associated with planning, optimization, management and control and provides decision criteria to help IT organizations choose appropriate solutions.
Download now »A common misconception is that mid-range storage requirements are dramatically different than that of a larger enterprise. Mid-range storage users may require less capacity, but they have similar functionality and management requirements. This ESG paper examines mid-range storage needs and reviews a new solution that adjusts size while retaining value, performance and functionality.
Download now »
Sign up to receive InfoWorld Resource Alerts
