"The art part of big data is associated with discovery and explanation," Barth says. "You're looking for something you can't quite articulate. There's a phase of analytics that's exploration and discovery, in which you're generating hypotheses. Then comes modeling and application. In my view, traditional BI tends to be really far down the road after you understand the underlying analytics and correlations that relate to an issue you're seeing. The phase in which you don't understand it, the discovery phase, is where big data is useful."
There are seven distinct steps to answering a complex business question, NewVantage says:
Traditionally, companies have spent 80 percent or more of their time on the third and fourth steps, NewVantage says. But big data solutions offer up new ways of approaching these steps.
First and foremost, because of the relatively low cost and high capacity of big data platforms, organizations can load all of the data from their source systems rather than choosing particular data for the question at hand.
"While this may seem wasteful, it eliminates two important delays: writing programs to select just the data needed, and going back to the source systems multiple times as new insights generate new questions that need new data," NewVantage says. "Building traditional data marts and data warehouses is extraordinarily complex and costly. The broad range of open source offerings coupled with flexible, scalable grid systems create an environment that not only drives down costs, but also offers the potential of decreasing query times exponentially."
For instance, Barth points to a large financial services firm that wanted to perform multi-channel pathing analysis of its customers to understand which elements led to a sale and which led to attrition. To do so, the firm needed to integrate six months of session data with other channel data. The first attempt, using traditional relational databases, took tens of thousands of lines of SQL code and the firm soon realized that it could only afford to access six days worth of data rather than six months. The firm abandoned the attempt after calculating that the effort would take weeks.
"In a big data environment, they were able to write and execute it in under 100 lines of code," Barth says. "They executed it in less than 24 hours, processing hundreds of terabytes of session data. The analysis was on data they already had; they had it inside the bank. They just wanted to know what their own customers were doing on their own channels. This really unlocked that visibility into the way their bank was running."
But the key, Barth says, is to take it one more step. Once you understand what you're seeing, you can develop a model that explains it and metrics to measure your execution in improving your business against that model. That's where traditional BI comes in.
"The "new" and the "known" are not islands; they must be symbiotic systems connected to and feeding each other," NewVantage says. ""New" analyses require rapid access to all the "known" data representing the reality of today's business. Conversely, there must be a disciplined approach to promoting new insights, data and models to evolve the "known." Without this linkage, the systems diverge into incoherence that does not reconcile or scale."