Doing big data right takes sophisticated techniques to ensure ad hoc results are reliable

Big data danger No. 3: This isn't why the data was collected

A big reason companies are collecting big data is to discover previously unrecognized correlations -- patterns, trends, associations and so on -- they can turn to their advantage.

Or to their disadvantage -- professional statisticians know the importance of designing data collection so as to provide a sample that can be subjected to legitimate analysis. Analyze data not collected for that purpose and there are any number of ways to get wrong results out of it.

Real-world example: Back in my deep, dark past, I was responsible for analyzing the performance of the six presses owned by the daily newspaper I worked for. I dug into the database and discovered, to my surprise, that the press everyone thought was the best performer -- it was 20 years younger and built on more modern technology -- was in fact delivering the worst results.

Fortunately, I was just barely smart enough to talk over my findings with a frontline manager before presenting them to upper management. I say "fortunately" because the press manager I talked to explained the facts to me: The press in question ran all the most difficult jobs. As the database I was analyzing didn't have job_difficulty_rating as a field, my analysis missed this subtlety.

Beware big data dangers -- but don't be scared off by them

None of this is a reason to ignore big data. Its potential value is, in many situations, significant and possibly transformational.

Doing it well is far from trivial, though. So when you head down the big data path, make sure your implementation minimizes the chances of getting answers that are clear, compelling, and wrong.

