Just how big are big data? Not the big data hype bubble, mind you-we know that's enormous. Rather, how large do data sets have to be before we can consider them big data?
There is no one answer. Big data is a relative term. It refers to data sets, and the corresponding data challenges, so large that traditional data management and analytics approaches aren't up to the task of squeezing all the value we desire from the information we have. As a result, as our tools and techniques improve, the "bigness" threshold for big data will continue to rise.
[ Check out what InfoWorld's Eric Knorr has to say about unlocking the value of big data. | Read Andrew Lampitt's new InfoWorld blog, Think Big Data. | Download the Big Data Analytics Deep Dive by InfoWorld's David Linthicum for a comprehensive, practical overview. | Harness the power of Hadoop with InfoWorld's 7 top tools for taming big data. ]
This threshold also depends upon the context for the data, which generally aligns with the industry responsible for them. Genomics research, weather prediction and other scientific pursuits push the limit of data set size, but any business that collects information about its customers may also have big data challenges.
Keep in mind Parkinson's Law of Data: the amount of data available expands to fill the available space for it. As our technology for creating, moving and storing data improves, the big data threshold will continue to rise. If anything, it seems the relentless advance of technology is driving the ever-increasing acquisition of information-and this deluge promises to swamp even the most facile of big data strategies.
The central big data challenge, of course, is how to derive value from such immense data sets, essentially recovering those rare gems in the rough-identifying the important, meaningful and insightful nuggets in the onslaught of noise.
Counterintuitively, the more information we have, the less we actually desire, since we only prize the results of careful analysis of our big data, not the data themselves. A mountain containing gold is worthless, regardless of the size of the mountain, if the cost of extracting the precious material exceeds its value.
Today, the U.S. government faces the mother of all big data mountains. From National Oceanic and Atmospheric Administration (NOAA) weather data to earth science information from the U.S. Geological Survey (USGS) to the genomics data at the National Institutes of Health (NIH), the government-and, therefore, the American people-own perhaps the largest collection of big data sets on this planet.