Big data is the latest hot industry trend. If you're in the kind of company where upper management is putting pressure on IT to be trendy, it's time to plan. The big question about big data: Plan for what, exactly?
With big data, you have only two concerns, but they are, naturally, big ones: where the data will come from and what your company will do with it. Solve these and you have big data licked. But if upper management suffers from Great Dictator disease and wants the fruits of big data regardless of whether there's any big data to draw from, big data will have you licked in the end.
Here's how my consulting company solved these concerns for one of our clients:
Big data solution No. 1: Doing big data without any
"I need big data, and I can't find any!" my client told me. His anxiety was palpable as he explained his situation. "I've been reading in all the IT publications that we need to do big data or our competitors will leave us in the dust.
"So I got my whole team together to look for some," he continued. "We looked in the data center. We looked in every office and cubicle. My help desk analysts even went under desks and looked in everyone's trash, but we couldn't find any at all. You're my consultant for this sort of thing. What should we do?"
I'm in business to help my clients, not to say no, so we got down to work. Eventually we put together a program that made sense. Here's the methodology:
Gather requirements. Big data is no different from any other IT initiative in this respect. You have to start with what the business needs. My team of crack consultants interviewed the company's key stakeholders, asking for their most important questions and -- this part was critical -- the answers they want.
Install Hadoop. IT projects have to be fully buzzword-compliant or they'll fail. For a big data project, this means Hadoop. If you don't want to invest staff time and energy learning this technology, do what my client did: Build a virtual server, install MySQL on it, and assign the name "Hadoop" to the server. When your BDSC (big data steering committee) asks if you've installed Hadoop, you can answer in the affirmative with a clear conscience.
Build a random big data generator. Not every company has a lot of big data lying around, waiting to be loaded. A random data generator is an inexpensive alternative to tracking down actual big data and building the input processing systems to load it. If the company has no big data to load, this is politically safer than going back to the BDSC to give them the bad news.
Build data-biasing module. This is crucial. The data-biasing module adjusts the randomly generated big data so that the analytics generated from it provide the answers everyone wants.
This last step is even more important than you might expect. Without it, execs who query big data will get answers that are probably at least as good as the ones they'd arrive at on their own (unless they use some other random-number-driven decision-support system, like a dartboard, coin toss, Ouiji board, or Magic 8-ball, at which point it's a tie).