‘To supervise or not to supervise’ is a question at the heart of machine learning

A guide to unsupervised and supervised machine learning and understanding their application by businesses today

ai machine learning iot framework vr

Following up on my last article, “Demystifying machine learning,” it’s clear that the machine learning space has blossomed over the last few years. Machine learning technologies are gaining momentum and application in the enterprise. Competition is fierce with many new companies entering the market, and, it is not surprising that this has led to some confusion and doubt in the market, particularly as new entrants try to find their way—and their voice—in a world of competing visions.

As a result, there is much debate about which technology and methodology is best and, specifically in the field of machine learning, which methods are the right ones to use for a task. Each vendor will undoubtedly have a perspective to bring to the table and, of course, there is room for many differing approaches. However, what is not in dispute are the methods—what they are and what they should be used for.

I see a growing need for clarification and education about artificial intelligence and especially machine learning. To the extent that is possible to detail, in simple terms, some of the methods used by the leading intelligent content analytics platforms and to investigate why they are being used, I would like to contribute to an effort to clear up some of the confusion around machine learning and separate the hype and misinformation from reality.

Let’s start with a question that is at the heart of machine learning, especially in my field of legal AI and intelligent content analytics. That is, to supervise or not to supervise? Before I go into details of each method, let’s consider what each of them means using an analogy from children’s learning patterns.

Child’s play

National curriculum and accepted wisdom across the world stipulates that school children are taught by a teacher and that they are “supervised” as a means of assisting with their learning. Toddlers know very little when they start to learn from their parents, but, by the time they start school, children begin to pick up everything they need to know based on a defined plan.

A curriculum or teaching plan allows children to apply different knowledge to different tasks, without being explicitly taught how to perform that task. This is essentially supervised learning in that the “model” (in this case) is a child in a classroom setting. This also holds true for machine learning.

So, what is unsupervised learning? Let’s call it “break time” or when the children go out to play. In a classroom setting, children are normally grouped based on set criteria such as age, learning ability, or skill set. in the playground, where no supervision is given, children cluster together based on very different qualities. For example, if the children pick up a ball and start to play a game, the classroom grouping may go out of the window and other grouping criteria may start to play more of a role.

Does this mean that unsupervised learning is worse than supervised learning? Clearly not. The children are learning different things from different scenarios, as do the teachers on patrol. Teachers might learn which children have common interests. They might observe which children are happy to play by themselves and which kids are happier in a group. They might spot leaders or individuals who are happier to follow others. Friendships may form, social skills are put to the test.

The observations from this unsupervised, automatic grouping are endless. in the classroom setting, the teacher or supervisor has a very strong influence, but in the playground where grouping is unsupervised and undirected, the supervisor has very little say in what the groups may be and how they are formed. 

Does that mean one setting is better for learning than the other? No, not in the case of children. Children learn many things that cannot be taught in the classroom when they are playing. The two setting are equally valuable in raising a balanced individual. As you’ll see, the same applies to software. So, let’s turn this back to machine learning and the platforms that use learning methods.

Supervised learning

Let’s start with supervised learning. Supervised learning, in the context of machine learning, is using many different platforms, with many different algorithms, to analyze and sort data. Some of the most common methods include support vector machines (SVM), maximum entropy (with and without conditional random fields), and deep learning with back propagation.

Each of these methods could be likened to a classroom where models or teaching plans are created based on the task at hand—mathematics, English, French, or science, for example. Each subject or model has a specialist teacher. You would not expect a French teacher to be an expert in design tech, so each model is taught based on the teachers’ expertise and them providing examples for the class to learn from.

Unsupervised learning

In the case of deep learning, the data scientist provides examples after the model has learned something in an unsupervised way. That is important, because it points to a combination of methods. Put simply, unsupervised learning allows a system to work out links and inference in the data, without using any preconceived ideas. The system is learning about the data, from the data only.  

So, what are some of the unsupervised learning methods used in intelligent content analytics?  Examples of common algorithms and methods include latent semantic indexing and analysis, “nearest neighbor,” such as K-Nearest Neighbors Algorithm (k-NN), single value decomposition (SVD), Word2Vec for word and phase detection, and reduction, and Naïve Bayes.

This is by no means an exhaustive list, but what these methods have in common is that they are all initially unsupervised methods. Like the children in the playground, you see clusters of documents or words based on similarities or other autodetected features.

One example of this being used is in software for the detection of near duplicates, where a system groups items based on the similarity of the words, phrases and sections included in a document. A further example is when a system performs clustering of information based only on the similarity of each given section. Both methods are simple in nature and require no supervision.

Deep learning

Let’s focus on the combination of unsupervised and supervised learning, using deep learning as an example. To achieve the best balance and overall learning skill set a combination of the two methods of machine learning is often best.

This allows a system to see and learn things that potentially were not visible at first, and to forge a path and analyze the data without supervision. Then, when it’s given known data examples, a system can look to fit what it has learned to the data it is being told is of a set type.

Similar to homework for a child, where the child is learning to apply thinking to a problem and then is being shown the correct answers the next day in the classroom, if the student or the system is correct, no adjustment or teaching is required.

However, if the answers were slightly off, you can adjust the understanding and do a bit more supervised learning. This is called backpropagation, in which an attempt is made to adjust the weights of the layers to meet the required output the system is being told is correct.

My view is that an optimum system contains both supervised and unsupervised methods in it. Building a system based on one method may produce acceptable results, in many cases, but will not gain the best overall results.

It’s worth asking yourself whether you would prefer your child to grow up having only ever been taught in a classroom without unsupervised time, or whether a balanced education and a healthy combination of unsupervised and supervised learning is best.

I have the same view. Intelligent content analytics requires a balanced combination of both unsupervised and supervised learning. This makes it possible to learn in the best overall way, enabling the system to be applied to new challenges it has never encountered and in new domains, while applying the same underlying functions and methods it acquired from the very start of its learning journey.

With this new knowledge about machine learning, you can begin to make more sense of the facts and dispel the myths that lead to confusion and bad business decisions. And armed with this information, you can ask software vendors what they prefer their children to become, and why the software they are selling does not live up to that same expectation.

Copyright © 2018 IDG Communications, Inc.

How to choose a low-code development platform