Machine learning floats all boats on big data's ocean

Machine learning is the unsung hero that powers many of the most sophisticated big data analytic applications

Machine learning is so pervasive that we can often assume its presence in big data applications without having to specifically call it out. About a year ago, I blogged about the "hardcore" big data use cases -- in other words, the applications that deliver the best results at "extreme scales." By the latter, I was referring to any combination of petabyte data volumes, real-time data velocities, and/or multistructured data varieties.

When compiling the list of applications in that article, I deliberately avoided listing "machine learning analytics." The reason why: Machine learning is a tool used in many, if not most, of these analytic use cases, but it's not a use case in itself -- in other words, it's not a specific application domain in its own right. For that same reason, I didn't list schema design, metadata management, or data integration as big data use cases. As with machine learning, all of these contribute in varying degrees to realizing value for most big data analytic applications.

[ 18 essential Hadoop tools for crunching big data | Download InfoWorld's Big Data Analytics Deep Dive for a comprehensive, practical overview of this hot topic. | Cut to the key news for technology development and IT management with our once-a-day summary of the top tech happenings. Subscribe to the InfoWorld Daily newsletter. ]

Machine learning's contribution to big data application ROI is twofold: boosting data scientist productivity and uncovering hidden patterns that even the best data scientists may have overlooked. These value points derive from machine learning's core function: enabling analytic algorithms to learn from fresh feeds of data without constant human intervention and without explicit programming. The approach allows data scientists to train a model on an example data set, then leverage algorithms that automatically generalize and learn both from that example and from fresh data feeds.

In many ways, machine learning can be the ROI capstone of your big data initiative. Your investment in machine learning can help deepen whatever business case you've made for big data in the enterprise. That's because machine-learning algorithms grow even more effective at your data scales in volume, velocity, and variety. As such, it's another example of, per my discussion in this recent article, how big data's bigness can be its core driver of value.

As Mark van Rijmenam says in this recent article on machine learning: "The more data is processed, the better the algorithm will become." Many of the machine-learning applications that he discusses -- ranging from speech and facial recognition to clickstream processing, search-engine optimization, and recommendation engines -- might be described as "sense-making analytics" (which, now that I think of it, I should have included in my list of hardcore big data applications).

Sense-making analytics involves continuous monitoring of feeds whose semantic patterns, context, and importance must be inferred from the stream. In support of automated sense-making, machine-learning algorithms must often handle feeds of daunting complexity, such as feeds that incorporate implicit semantic hierarchies among constituent objects or those environments where an overall sense must be gleaned in real time through correlation of multiple distinct streams. The streams may include various objects, such as data, video, images, speech, faces, gestures, geospatials, and browser clicks. And the sense to be auto-extracted from streams, via machine learning, may be any blend of cognitive, affective, sensory, and volitional features, per my recent discussion in this recent blog.

In order to find the signals in all this noise, "deep learning" is an important tool in the data scientist's machine-learning repertoire. As van Rijmenam discusses, deep learning, which often leverages neural networks, helps to extract sense from streams that may involve a hierarchical arrangement of semantic relationships among component objects. "[Deep learning] is capable of breaking down different characteristic constituents in the data and uses those characteristics to learn itself different combinations of those characteristics to know what it sees (a face for example) or what to do (walk for example with robots)."

Clearly, machine learning is a fundamental tool in building a world that can sense and react to dynamic, distributed patterns. Humanity's ability to detect and respond to real-time threats and other issues -- terrorist activities, natural disasters, hurricanes, and so forth -- depends on automated sifting, sorting, and correlation of events across myriad streams.

Without this ubiquitous capability, the human race risks drowning in its own big data.

This story, "Machine learning floats all boats on big data's ocean," was originally published at Read more of Extreme Analytics and follow the latest developments in big data at For the latest developments in business technology news, follow on Twitter.


Copyright © 2014 IDG Communications, Inc.