Better analytics, better performance, but at what cost?
It seems clear that this kind of machine learning spells the death of traditional BI. Business intelligence, after all, is built on asserting an insight against a set of data: "I think warm weather makes people want to book cruise vacations. Run a report correlating temperature against cruise bookings."
The problem with this approach is that it depends on a human deciding the right correlation among the data. This is where it gets sticky. You're depending upon the judgment -- and prejudices -- of a person to decide the relevant data to look at. It's much more compelling to let the data identify what is relevant.
That's where this area becomes troubling.
Once you start down the road and say, "Let the data tell me what to do," the natural impulse is to get more data. That mobile company will strike a deal with, say, a consumer products company to get information on purchasing habits of other types of products to enable analysis regarding churn.
You might argue that this is still in the service of offering a customer more compelling reasons to stick with the wireless provider, so it's all good. What this brought to my mind, though, is a more troubling area: Finance and credit. This has been the battleground of data collection and accuracy for years.
Unlike a poorly targeted mobile offering, which can be ignored, an inaccurate credit report has real-world consequences. It's taken political action to force credit agencies to identify their data sources and offer the capability to correct inaccuracies.
A story I saw last week about a new, big data credit analysis company called ZestFinance brought this to mind. At a recent GigaOm conference, the company noted that it is "a new style of underwriting company that uses 70,000 data signals and 10 parallel machine learning algorithms to assess personal loans." The company looks at nontraditional signals, such as whether or not a would-be borrower has read a letter on its website, to better evaluate whether someone is likely to repay a loan.
The notion is that, by exploiting all these data signals, the company will find people that traditional credit agencies say are unworthy of a loan but, upon more detailed examination with these other data elements, are actually a good credit risk.
That's all well and good, but what happens when the signals say you aren't a good credit risk? Or when you find out you're paying half a percentage point higher interest rate on your mortgage than your neighbor? When you ask why, you'll be pointed to the fact that your neighbor read a letter on the credit company's website --or, more likely, you won't find the reason, either because the person you contact doesn't know, since, hey, it's in the algorithm, or the company won't tell you because the algorithm is a trade secret.
I generally try to avoid being alarmist about technology trends, but this aspect of the big data domain is a big concern. The combination of data and credit analysis has always proved to be explosive, and I don't expect the machine learning and big data version to be any less so. It's going to be especially problematic because of the vague judgment criteria and the likely secrecy the industry will try to impose on its shift to machine learning.
So, BI is dead, and long live BI. We're clearly on the cusp of a new way of generating insight, moving from the inefficient sifting methods driven by humans to new methods that leverage machine learning to identify relevant patterns and outcomes. We're in for a lively next couple of decades as this BI movement plays out.
Bernard Golden is the vice president of Enterprise Solutions for Enstratius Networks, a cloud management software company. He is the author of three books on virtualization and cloud computing, including Virtualization for Dummies. Follow Bernard Golden on Twitter @bernardgolden.
Read more about cloud computing in CIO's Cloud Computing Drilldown.