MapReduce, not just for Google et al anymore

Open source startup Cloudera aims to bring MapReduce style of applications to mere mortals (aka enterprises)

Cloudera, an open source startup working to expand the use of Apache Hadoop, made two announcements today. First, it has secured $5 million in Series A funding. Second, the Cloudera Distribution for Hadoop is now available.

What's Hadoop? It's a platform for developing applications that can process vast datasets while scaling to the levels that companies like Google, Facebook, and Yahoo require. Hadoop is an Apache project that:

implements MapReduce using the Hadoop Distributed File System (HDFS) (see figure below). MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located.

Cloudera sees a market for Hadoop in enterprise situations, from analyzing genome and protein data, oil, and gas exploration to financial processing. The Cloudera Distribution for Hadoop is open source and licensed under the Apache Software License 2.0. Cloudera intends to drive revenue from support and implementation services.

I've typically been down on a support or services-based open source business. However, in the case of Cloudera, this model makes sense -- for now. The number of people who can implement a highly scalable application that processes petabytes of independent data relationship using the MapReduce programming model and who don't work for Google, Yahoo, Facebook, and the like can probably be counted on two hands. There is a degree of education and hand-holding that Cloudera needs to do while enterprise developers explore writing this style of applications.

Take a look at the investors and it's easy to predict that Mike Olson and team won't be independent for long:

In addition to Accel Partners, investors in Cloudera include Mike Abbott (senior vice president, Palm), David desJardins (early Google employee), Caterina Fake (co-founder, Flickr), David Gerster (entrepreneur), Youssri Helmy (entrepreneur), Dr. Qi Lu (president of the Online Services Group, Microsoft; former executive vice president, Yahoo!), Marten Mickos (former CEO, MySQL), In Sik Rhee (former chief tactician, Opsware; founder, Loudcloud), Jeff Weiner (president, LinkedIn; former senior vice president, Yahoo!), Dick Williams (CEO, Illustra; former CEO, Wily Technology), Gideon Yu (Facebook CFO; former senior vice president, Yahoo!; CFO, YouTube).

All the best to the Cloudera team.

p.s.: I should state: "The postings on this site are my own and don't necessarily represent IBM's positions, strategies, or opinions."

Copyright © 2009 IDG Communications, Inc.