What machine learning will gain in 2016

Open source origins, more data sources, and the reinvention of tech dinosaurs will be pivotal for machine learning in 2016

What machine learning will gain in 2016

Machine learning is no longer some esoteric practice limited to mystical incantations by data scientists. It's now a mainstream presence thanks to ubiquitous big data, and easier tooling and frameworks.

Here are four ways the machine learning landscape is likely to change over the coming year as it continues to both exert an influence on IT and be influenced by it.

Tech dinosaurs will continue to remake themselves around machine learning the way they did with the cloud

When the cloud first became the tech buzzword, it turned into a guiding light for all the old-school tech companies that were starting to look like they'd outlived their usefulness: HP, IBM, Microsoft. Now, machine learning's turning into their next big savior.

Some have already made this pivot quite adroitly. IBM in particular has shed its dead-end and no-go businesses (commodity PCs and servers) to make room for all the things big-data-empowered machine learning makes possible. Watson, its machine-intelligence-as-a-service platform, has gone from being a PR stunt to something that promises real utility for businesses, thanks to its public API set.

Who's next? Oracle, perhaps. As a database company, dealing with large volumes of data (machine learning's food) is part of its DNA. Producing some kind of machine-learning-as-a-service seems a shoo-in, although it's likely to be pitched mainly to its existing, and captive, customer base. It might well work for Oracle, but don't take that as proof it's a model others can follow.

Apache Spark will unshackle itself even further

An in-memory processing system with a host of machine learning functions, Spark has garnered acclaim and attention for both its speed and ease of use. Now there are plans to help make it even faster and more powerful, in part by engineering end runs around the limitations of the JVM. More machine learning work is also scheduled to land in future versions of Spark, including new Scala APIs.

Notably, Spark is continuing to grow on its own, away from the Hadoop big data framework where it rose to prominence. Hadoop's just one of the many data sources that Spark can use, and while machine learning needs lots of data to work well, there's nothing that says Spark has to rely on Hadoop to get it.

Machine learning must be open source by default from now on, no matter what form it turns up in

This may seem obvious to those who have been in either the machine learning or open source spaces for some time now. But such things aren't always obvious to those on the outside, and the way open source has more or less eaten the software industry whole came as a shock to those not looking to see how it might happen.

With machine learning, all algorithms are best when open by default. It not only make the work easier to check, it means products that use the algorithms have more transparency about what they're doing. It becomes more difficult to perform machine learning washing, which we're seeing in security, one of the last industries where it's still possible to fob off black-box solutions on customers. And it puts the emphasis more on the data, and on data sources -- the real secret sauce for machine learning going forward.

The struggle for data sources to feed machine learning will become all the more heated

The sheer amount of data now casually available is a foundational reason machine learning has finally taken off. Where we get that data from, and who supplies it, will become major issues for machine learning in all its forms. IBM's purchase of the Weather Channel's digital assets meant that the company now had a fresh flood of real-world data it could supply to its machine learning platforms and APIs. Who knows how many other suppliers of data will be snapped up or repurposed to feed machine learning's growing hunger?

Copyright © 2016 IDG Communications, Inc.