Big data: There's signal in that noise

When the data you capture and crunch is large and disorderly, interesting bits may come along for the ride. Don't squeeze the life out of it -- explore it

Fearmongers warn that capturing data means capturing noise and well-tended gardens are the only way to manage data. Well, guess what? Sometimes noise is the point.

Read any article by any not-very-technical journalist parroting those who sell fear -- or any comment on one of my posts from a scared cube dweller who was hoping to ride PL/SQL to retirement -- and you'll hear dire warnings that capturing all this data before you can interpret it will spell doom and disaster. Caution! Much of the data is noise! Noise is bad and you risk terrible error!

[ Download InfoWorld's Big Data Analytics Deep Dive for a comprehensive, practical overview of this hot topic. | Keep up with the latest developer news with InfoWorld's Developer World newsletter. ]

With any new approach you take risks. What if the noise turns out to be more valuable than the data you're trying to capture?

That's exactly what happened recently with Jawbone's Up, a popular activity-tracking wristband. Last year, Jawbone hired a vice president of data, Monica Rogati, to start mining the gobs of data accumulated so far. After the earthquake in Napa last weekend, Jawbone published a graph showing a large percentage of Up wearers awaking, with their wake time and the percentage of those awoken directly related to their proximity to the quake. Those in San Francisco, for example, woke up slightly later and in fewer numbers than those in Napa.

Not long ago the Wall Street Journal noted that Twitter found quakes faster than seismometers -- so imagine how quickly Up might work in detecting disasters compared to Twitter. There are obviously limitations and problems. (Heck, I still wish Jawbone could tell me how much more reliable the Up24 is compared to the Up Gen 2, but it refuses to say.) But imagine the potential!

Consider wildlife tracking. For years people have been radio tagging catch-and-release animals. Wouldn't they run from the epicenter? Perhaps that noise is exactly what we need. I realize they might not be tagged in sufficient numbers or have sufficient range to make that feasible, but it's a thought.

We've seen other instances of noise being more interesting than the data. Recently, while watching for plate movement, GPS recordings indicated that the West Coast is rising. Why? Because all those people moved into the desert and planted palm trees and started drinking more water than could be piped in. Meanwhile, everyone was putting more carbon in the skies, which warmed the air and depleted the water further (not a shocker that there's a drought). The GPS readings indicate how water weighs down the land -- and the degree to which land rises when the water leaves. Now we have a new measure of true water depletion across the land.

Pure science revels in these side-effect numbers -- and curious scientists or people paid to rationalize the data figure out why. This "noise" ... what does it mean? Is it significant?

Now imagine the data you're capturing data for your business. What useful noise might it contain? Therein lie key opportunities for business development, loss prevention, cost reduction, and especially supply chain planning. These opportunities span business units, demand data liberalization, and require pooling data beyond what we can plan for.

They also require people dedicated to mining the data, except that conventional data miners tend to put on blinders and do what they're told. You need domain experts to get involved and look for stuff you don't realize is important yet. This requires people to be curious, ask questions, and seek serendipity. Sure, you might find pirates fight global warming and note that correlation is not always causality, but you might also find that animals run from the epicenter faster than your fancy devices detect an earthquake.

You may also find out that your customers or coworkers are unconsciously communicating something to you. To some degree, this is a game of Data Katamari: You need to gather enough mass to create a star. There are whispers of wisdom in that noise. It's our job to listen hard and discover them.

This article, "Big data: There's signal in that noise," was originally published at InfoWorld.com. Keep up on the latest news in application development and read more of Andrew Oliver's Strategic Developer blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

Mobile Security Insider: iOS vs. Android vs. BlackBerry vs. Windows Phone
Join the discussion
Be the first to comment on this article. Our Commenting Policies