Big Data does not necessarily mean Good Data. And that, as an increasing number of experts are saying more insistently, means Big Data does not automatically yield good analytics.
If the data is incomplete, out of context or otherwise contaminated, it can lead to decisions that could undermine the competitiveness of an enterprise or damage the personal lives of individuals.
[ Check out the slideshow: 10 big data trends changing the face of business. | Explore the current trends and solutions in BI with InfoWorld's interactive Business Intelligence iGuide. | Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. ]
One of the classic stories of how data out of context can lead to distorted conclusions comes from Harvard University professor Gary King, director of the Institute for Quantitative Social Science. A Big Data project was attempting to use Twitter feeds and other social media posts to predict the U.S. unemployment rate, by monitoring key words like "jobs," "unemployment," and "classifieds."
Using an analytics technique called sentiment analysis, the group collected tweets and other social media posts that included these words to see if there were correlations between an increase or decrease in them and the monthly unemployment rate.
While monitoring them, the researchers noticed a huge spike in the number of tweets containing one of those key words. But, as King noted, they later discovered it had nothing to do with unemployment. "What they hadn't noticed was Steve Jobs died," he said.
In the telling, it's a somewhat humorous story, outside of the tragedy of Jobs' untimely passing. But the lesson is a deadly serious one for those looking to rely on the magic of Big Data to guide their decisions.
King said the mix-up over the dual meanings of "jobs" is, "just one of many similar anecdotes. Anyone working in this area has had similar experiences."
"Lists of keywords, curated by human beings, work OK for the short run, but tend to fail catastrophically over the long run," he said. "You can fix it up by adding exceptions, but there's a lot of human labor involved."
He said it is easy for anyone to create their own example just by entering a keyword into the Bing Social page.
"You'll see some relevant things and some irrelevant. If you don't change the query and watch over time, you will often find the conversation veering away in some way -- sometimes a little, sometimes not at all for a while, and sometimes dramatically," he said.
But King said that overall there are many examples of big data analytics producing useful things, "so failures tend not to appear in the literature."
Kim Jones, senior vice president and CSO of Vantiv, said this is not a new problem, but one that can be magnified if people think massive amounts of data are magically going to produce good analytics.
"The Jobs example was a classic case of data without context. Data by itself doesnt equal intelligence," he said.