We in the data business have been saying that one of the big reasons you want Hadoop or other data tools is to perform machine learning on data. For the most part, this isn’t happening. Instead, machine learning has been a major contributor to the strategic snake oil reserve.
The definition of machine learning has been stretched beyond recognition. The best explanation for its common use today: “statistics, pattern recognition, and artificial intelligence” (but never say AI because the Cylons will get us or we’ll have another AI winter).
Don't get me wrong, machine learning is both real and useful, but beyond finance, the knowledge to figure out how these tools apply to business is rare. Last week, I wrote that prepackaged algorithmic solutions are the future of “big data,” but if you look today for those solutions, you’ll find mostly recommendation engines or “fraud detection,” which remain two of the best-understood areas.
What we lack goes beyond better tools. We lack imagination. I’d argue that our entire industry grew out the electronification of paper and has for the most part barely moved beyond elementary data processing.
In the data trade, the vendors fight among each other (if they’re weak) and market against Oracle (if they’re stronger), but the real enemy is Microsoft Excel. The battle isn’t the data center vs. the cloud or Hadoop vs. Oracle or big data vs small data. The real battle is the data cloud vs. the meat cloud -- the latter being people who pull reports into Excel and make emotional decisions backed by data.
Meanwhile, the tech industry likes to talk about solutions while selling platforms. This is the only industry where customers buy the equivalent of a hammer and expect a house to come out of the business end all by itself, only to be disappointed until someone sells them an even better hammer.
Our imagination doesn’t allow us to see that the noise detection or intrusion detection algorithm that finds anomalies in signal processing can also be set up to find new services or products a company could be marketing to a larger audience. Nonetheless, although human creativity may be needed to create the campaign or the product, machines crunching cold hard numbers can make many key decisions independently ... after we deploy the appropriate technology in the right configuration.
Machine learning in finance provides an instructive example. Last week at the Red Hat Partner Conference, I presented a live demonstration of a distributed Monte Carlo simulation running on Spark against data in a JBoss Data Grid to determine the liquidity risk was in my portfolio. Monte Carlo, along with other tools common to finance, comes from physics. It isn’t uncommon in finance to use “machine learning” and even to implement trading strategies as algorithms.
What will it take to apply machine learning at that level to other business areas? It's virtually assumed that we'll all have self-driving cars in the next decade or two. We already have planes and space vehicles that can (and do) fly themselves. But where are the machine learning applications that tell us which bills to pay when and what invoices will come in at what time? Instead, we turn to the meat cloud operating Microsoft Excel.
How about machine learning that actually helps us make strategic business decisions? It's hard to know whether to call the end result a technology solution or management consulting. Do we bring in tech geeks or math geeks or a bunch of MBAs? Who manages a project like that in a large company? This is the kind of work that requires R&D. What kind of hairy simulations would you need to test the outcome of business decisions made by machines?
This stuff is hard and not everyone can do it. Look out your window and you’ll see that as a species only a few of us have any imagination at all. But the reward of having more accurate, more automated, less easily manipulated decision-making is the kind of competitive edge that serves as king- or queenmaker for entire industries.