My favorite buzz phrase associated with the big data trend is: "Data is the new oil." It implies that, at last, we can think of data almost as a natural resource, rather than simply a burden on data center infrastructure.
That exciting sense of potential is the reason we just launched InfoWorld's big data channel, which features a new blog, Think Big Data, by contributor Andrew Lampitt, who has been involved in a number of big data startups and has a clear sense of how the trend is taking shape. Andrew will focus on case studies that highlight the practical value of new technologies to explore and analyze vast quantities of data.
[ Read Andrew Lampitt's new InfoWorld blog: Think Big Data. | Download the Big Data Analytics Deep Dive by InfoWorld's David Linthicum for a comprehensive, practical overview. | Harness the power of Hadoop with InfoWorld's 7 top tools for taming big data. ]
At InfoWorld, we first identified the importance of big data technologies three years ago when we picked MapReduce/Hadoop as the No. 1 emerging enterprise technology of 2009. But there's more to the big data trend than that particular technology, although the ability to crunch semi-structured data using massively parallel processing distributed across commodity hardware is perhaps the most important new capability.
The explosion in NoSQL databases, explored by InfoWorld's Andrew Oliver in "Which freaking database should I use?," indicates a major shift in thinking about data management. And new data visualization tools enable domain experts to form a direct relationship with data sets and spot trends and patterns worthy of further exploration. That's a big change from the methodology of business intelligence, which usually requires stakeholders to submit requirements to experts, who tend to produce static reports that analyze trends mainly of historical interest.
One way of thinking about big data is that these new technologies have arrived in the nick of time. Regulatory compliance concerns have created a "save everything" culture in the enterprise, including the fastest-growing segment: log files containing security or system events and metadata capturing the behavior of visitors as they use Web applications. InfoWorld has described the collective effect of all these new sources as the data explosion, where -- according to a famous 2008 IDC report -- the storage requirements of enterprises are increasing at a rate of 50 percent per year.
Facebook and Yahoo have been the highest-profile companies to use big Hadoop clusters to process Web clickstream data for insight on visitor behavior. But the opportunities in scientific research, manufacturing, public safety, and innumerable other areas where pools of data run deep are even more profound. We're only at the very beginning of an era when sensors will be deployed everywhere to monitor anything worth measuring, from outpatient vital signs to vehicle traffic patterns to soil conditions. The potential for collecting and analyzing magnitudes more data is staggering.
Recently, I had a conversation with Sanjay Mehta, vice president of marketing for Splunk, a pioneer in using MapReduce to derive insight from system and security event logs in real time. He notes that the proliferation of new mobile devices alone is rapidly increasing the quantity of data collected.