You Need a Needle to Make Use of the Haystack

Big Data is the process, not the numbers. Here's how to make it work for you.

block stack

Read the tech press and you are all but guaranteed to see headlines like: Big Data will solve this. Big Data will solve that. Big Data is the hot thing. You have to have Big Data.

Indeed, Big Data is so ubiquitous that it’s understandable when people think they know what it means, even if they don’t. All you have to do is parse the simple, two-word term and it’s clear what’s being talked about is a huge pile of information, right?


Big Data is the process, not the numbers. It’s the ability to analyze any kind of information to yield robust, repeatable, actionable insights. Given the confusion there’s no surprise that a lot of people don’t understand how to use it.

There’s a misperception that the best process is to gather all your information, put it in all into one stack, and run analysis programs which spit out insights. When this naïve approach fails, people begin to think of Big Data as a money pit. A recent survey of C-suite executives by Actian found that 77 percent feel their big data and analytics deployments fail to live up to their expectations.

They’re right to feel that way. If you take a data-first approach, you’ve almost certainly spent way more money than you need to for a system that is capable of doing anything and everything. But the fact is you don’t need to do anything and everything. You need to answer a question. There is something you need to know but don’t; that’s why you’re seeking a Big Data solution. Once you have the question, you build the minimum infrastructure needed to answer it. Going about it any other way is looking for a needle in a haystack when you don’t even know what a needle looks like.

Here is the #1 thing to remember about Big Data: The question is the answer. You start with one big question and then refine, refine, refine until you have a question that perfectly articulates your biggest challenge and is going to tell you exactly what you need to know. Then and only then should you examine all your different data sets and utilize all your Big Data tools. Assess to see which ones are useful and which aren’t. This will minimize the amount of data you need to analyze and the size of the system you need to do it.

So before you use Big Data, you need Big Question. Figuring out Big Question goes well beyond the usual analyst’s skill set. It requires intimate and thorough knowledge of the business. Because while it’s nice for data scientists to be able to analyze all that data, you really need to think about where that’s going to get you.

Here are three guidelines for finding Big Question and building a successful Big Data initiative:

  • Pick a good use case. Identify a well-defined problem that the analytics team already understands. It should be one with new, unstructured, and semi-structured data sources that couldn’t be included in previous analysis.
  • Prioritize which data you include in your analysis. Importing, cleaning, and organizing data sources isn’t easy (or cheap), so don’t try to use every possible source of data. Start with data you already understand, then sparingly add additional sources to enrich your analysis.
  • Measure your results. Quantifiable outcomes are essential if you are going to determine whether you’ve succeeded. There is often an unrealistic expectation that valuable insights will just suddenly appear from Big Data. If they don’t, the project may be judged a failure, even though it has provided real improvements on a meaningful metric. That’s why you need to define key metrics in advance and measure them before and after the project.

Do that and you’ll know what your needle looks like, where to look for it, and whether or not you found it.