Figure 4: AI, machine learning, and deep learning complement BI.
Please note that BI, statics, and AI, machine learning, and deep learning can do more than what is described in Figure 5; this example simply demonstrates how these methods can answer a series of progressive business questions.
Figure 5: Integrating BI, statistical analysis, predictive AI, machine learning, and deep learning.
While statistical modeling on one side and machine learning and deep learning on the other are both used to build models of the business situation, there are some key differences between the two, as Figure 6 shows. In particular:
- Statistical modeling requires a formal mathematical equation between the inputs and outputs. In contrast, machine learning and deep learning don’t try to find that mathematical equation; instead they simply try to re-create the output given the inputs.
- Statistical modeling requires an understanding between the variables and makes assumptions about the statistical properties of the data population. Machine learning and deep learning do not.
Figure 6: Statistical modeling vs. machine learning.
Typically, because statistical modeling requires a mathematical equation and an understanding of the relationships among the data, statistical models take a relatively long time to build as the statistician studies and works with the data. But if completed successfully—that is, the equation is found and the statistical relationships among the data are very well understood—the model can be killer.
Machine learning and deep learning models, on the other hand, are very fast to build but may not achieve high performance to start. But because they are so easy to construct in the early stages, many algorithms can be tried simultaneously with the most promising of them continuously iterated until model performance becomes extremely good.
Machine learning and deep learning models also have the added advantage of continuously learning from new data “on their own,” and thus improving their performance.
Should the nature of the data change, the machine learning and deep learning models simply need to be retrained on the new data; whereas the statistical models typically need to be rebuilt in whole or in part.
Machine learning and deep learning models also excel in solving highly nonlinear problems (it’s just harder for people to do this—those equations get very long!). This attribute of machine learning and deep learning really comes in handy as microsegments become the norm (think customer segments of one, mass customization, personalized customer experience, and personal and precision medicine), and processes and root-cause analysis becomes increasingly multifactored and interdependent.
How AI, machine learning, and deep learning differ
So far, I have lumped together AI, machine learning, and deep learning together. But they are not exactly the same, as Figure 7 shows.
Figure 7: AI vs. machine learning vs. deep learning.
Generally speaking:
AI is where machines perform tasks that are characteristic of human intelligence. It includes things like planning, understanding language, recognizing objects and sounds, learning, and problem solving. This can be in the form of artificial general intelligence (AGI) or artificial narrow intelligence (ANI).
- AGI has all the characteristics of human intelligence, with all our senses (maybe even more) and all our reasoning, and so can think just like we do. Some describe this as “cognitive”—think C3PO and the like.
- ANI has some facets of human intelligence but not all; it’s used to perform specific tasks. Examples include image classification in a service like Pinterest and face recognition on Facebook. ANI is the current focus of most business applications.
Machine learning is where machines use algorithms to learn and execute tasks without being explicitly programmed (that is, they do not have to be provided specific business rules to learn from the data; put another way, they don’t need instructions such as “if you see X, do Y”).
Deep learning is a subset of machine learning, generally using artificial neural networks. The benefit of deep learning is that in theory it does not need to be told what data elements (or “features” in machine learnig speak) are important, but most of the time, it needs large amounts of data.
Figure 8 shows the timeline of AI’s evolution.
Figure 8: AI’s historical timeline.
The differences among explicit programming, machine learning, and deep learning can be better understood through the example of handwritten number recognition. To a person older than five years old, recognizing handwritten numbers isn’t hard. We’ve learned (been trained) over the years by parents, teachers, siblings, and classmates.
Now imagine getting a machine to do the same through explicit programming. In explicit programming, you have to tell the machine what to look for. For example, a round object is a zero, a line that goes up and down is a one, and so on. But what happens if the object isn’t perfectly round, or the ends don’t touch so it’s not fully round? What happens when the line doesn’t go up and down but instead tips sideways, or if the top part of the line has a hook (like “1”)—is it now closer to 7? The many variations of handwritten letters make it difficult to write an explicit program; you would be consistently adding new “business rules” to account for the variations.
As Figure 9 shows, in the machine learning approach, you would show the machine examples of 1s, 2s, etc., and tell it what “features” (important characteristics) to look for. This feature engineering is important because not all characteristics are important. Examples of important characteristics might be number of closed loops, number of lines, direction of lines, number of line intersections, and positions of intersections. Examples of unimportant characteristics might be color, length, width, and depth. Assuming you feed the machine the right features and provide it with examples and answers, the machine would eventually learn on its own how important the features are for the different numbers, and then hopefully be able to distinguish (or classify) the numbers correctly.
Notice that with machine learning you have to tell the machine the important features (that is, what to look for), so the machine is only as good as the person identifying the appropriate features.
The promise of deep learning is that no one has to tell the machine what features to use (that is, which ones are most important)—it will automatically figure this out. All you need to do is to feed it all the features from which it will select the important features on its own. While this an obvious advantage, it comes at a price in the form of high-data-volume requirement and long training time that requires significant computational processing capabilities.
Figure 9: Explicit programming vs. machine learning and deep learning: handwritten-number recognition.
AI model concepts: an overview
The idea behind machine learning and deep learning models is they learn from data they are given (things they have seen before), and then can generalize to make good decisions on new data (things they have not seen before).
But what constitutes a model? One definition of models is that they consist of three components:
- Data: Historical data is used to train the model. For example, when learning to play the piano, the data you are fed is different notes, different types of music, different composer styles, etc.
- Algorithms: General rules that models use for the learning process. In the piano example, your internal algorithm might tell you to look for the musical notes, how to move your hands on the keys, how and when to press the pedals, etc. Figure 10 shows the relationship between models and algorithms.
- Hyperparameters: These are “knobs” that data scientists adjust to improve the model performance, and they are not learned from the data. Again using the piano example, hyperparameters include how often you practice the musical piece, where you practice, time of day you practice, piano you use for practice, etc. The thinking is that adjusting these “knobs” improves your ability to learn how to play the piano.
When you put all of this together, you become a piano-playing model. In theory, depending on how well you’re trained, new musical pieces you’ve never seen before could be placed in front of you and you’d be able to play them.
Figure 10: Relationship between models and algorithms.
Types of machine learning
Machines, just like people, can learn in different ways, as Figure 11 shows. I’ll again use the piano-training analogy to explain:
- Supervised: Your instructor shows or tells you both the right way and the wrong way to play. In an ideal situation, you are given equal numbers of examples of how to play the right and wrong ways. Essentially, the training data consists of a target/outcome variable (or dependent variable) that is to be predicted from a set of predictors (independent variables). Using these sets of variables, you generate a function that maps inputs to desired outputs. The training process continues until the model achieves a desired level of performance on the training data. A business example of supervised training is showing the system examples of loan applications (consisting of predictors like credit history, work history, asset ownership, income, and education) that were approved or rejected (the target outcomes and decisions).
- Unsupervised: You’re on your own—nobody tells how you to play, so you make up your own ideas of right and wrong, with the goal of optimizing a parameter that’s important to you, such as speed of finishing the piece, the ratio of loud notes to soft notes, or number of unique keys you touch. Essentially, data points have no labels associated with them to inform you right or wrong. Instead, the goal is to organize the data in some way or to describe its structure. This can mean grouping it into clusters or finding different ways of looking at complex data so that it appears simpler or more organized. Unsupervised learning is usually less effective at training the model than supervised learning, but it may be necessary when no labels exist (in other words, the “right” answers are not known). A common business example is market segmentation: It's frequently unclear what the “right” maret segments are, but every marketer is looking for segments of natural affinities so they can approach those segments with just the right message, prootions, and products.
- Semisupervised: A combination of supervised and unsupervised. This is used where there is not enough supervised data. In the piano example, you would receive some instruction but not a lot (maybe because lessons are expensive or there aren’t enough teachers).
- Reinforcement: You’re not told what the right and wrong way to play is, and you don’t know what parameter you’re trying to optimize, but you are told when you do something right or wrong. In the case of piano training, your teacher might hit your knuckles with a ruler when you play the wrong note or play with the wrong tempo, and she gives you a backrub when you play things well. Reinforcement learning is very popular right now because, in several situations, there isn’t enough supervised data available for every scenario, but the “right” answer is known. For example, in the game of chess, there are too many permutations of moves to document (label). But reinforcement learning still lets tells the machine when it makes right and wrong decisions that advance to ward winning (such as capturing pieces and strengthening positions in chess).
- Transfer learning: You use your knowledge of playing the piano to learn another instrument because you’ve built certain transferable skills (such as the ability to read notes and maybe even developing nimbleness in your hands) that you can build on to learn how to play the trumpet. Transfer learning is used because it reduces learning time, which can be significant (several hours or even several days) for models that use deep learning architectures.
Figure 11: Modeling: common types of learning/training.
Common machine learning algorithms
As Figure 12 shows, common algorithm types include: