MySpace figures out how to do massive data analysis on commodity systems

2009 InfoWorld CTO 25 Awards: Aber Whitcomb

2009 InfoWorld CTO 25 Awards

Aber Whitcomb


It's hard sometimes to fathom the scale of the Web. Yet as CTO of News Corp.'s social site, Aber Whitcomb has to not only fathom it but build for it. In 2008, his BI team built one of the largest data warehouses in the world, capturing between 7 and 10 billion events each daily generated by its 130 million users. Whitcomb's team did so using commodity hardware, giving it super-computer-like analytic capabilities for a fraction of the cost.

Running a Web business on commodity hardware is not a new idea -- both and Google do so, for example. Both use MapReduce, the technology Google introduced in the early 2000s to break apart data sets for parallelized computing. (Google made MapReduce available to others in mid-2008.) But MySpace's implementation of Aster Data System's nCluster as its data warehouse extends MapReduce to handle rich in-database analytics on massive data volumes.

What MySpace gained, says Whitcomb, is a full understanding of what is happening online, immediately reflecting what people are doing on an hourly basis, both to give marketing efforts an edge and to identify customer issues before the spiral out of control.

[ Discover how the lessons learned from the 2009 InfoWorld CTO 25 Award winners can help your IT efforts. ]