AI, machine learning, and deep learning: Everything you need to know

All about the business benefits, technology frameworks and models, and application of artificial intelligence for better business outcomes

1 2 3 4 Page 4
Page 4 of 4

Let’s go through another example of an eager machine learning algorithm used in a recommender engine, similar to what you might see from many websites. In this case, you have data on four pet lovers, and you know their preference in terms of the type of pets they like and how much they like specific pets. Let’s assume there is a fifth pet lover (Amy) about whose preference you know less.

Your goals are two-fold: Predict the rating that Amy would give to a specific pet, and predict the preference of pets that Amy may like if you know her preferences of pet attributes. You should see that this closely resembles a similarity problem, using similarity of attributes between people you know more about to someone you know less about.

There are two ways to determine similarity in recommendation systems: collaborative and content-based, and collaborative can be further defined as user-based or item-based.

In the collaborative method, you need the ratings of users in the community. Applying this through the user-based approach, you predict what users like based on the likes of similar users in the community. By contrast, using the item-based approach, you predict what users like based on similarity among items that the community likes.

The content-based method does not use ratings of users in the community. Instead, it’s based on the characteristics of the items themselves, and the value (or label) assigned to these characteristics are provided by a domain expert.

Each method has its advantages and disadvantages, as Figure 22 shows.

ai slide 22 Trace3

Figure 22: Recommender example: similarity intuition.

Consider this example: In the collaborative method, you use the pet ratings of other users to predict an individual’s unknown rating of a pet.

First, try the user-based approach. Because you are comparing aggregate individual’s ratings that can be skewed by human bias (that is, their baselines can be varied), you use a similarity function called the Pearson similarity (see the equation in the figure) that tries to correct for user bias by normalizing the ratings (that is, by subtracting the average of the ratings from each user rating). Working through the example, you see that Alice’s ratings are most similar to Bill’s ratings, so you can assume Amy’s missing rating would be the same as Bill’s.

Now try the item-based approach. In this approach, you don’t focus on individuals’ ratings but instead on the items’ ratings. And because the items’ ratings are a composite of ratings provided by several individuals, you don’t have to be as concerned about bias, so you can use the cosine similarity function (see the equation in the figure). Here, you see that Cat is most similar to Hedgehog, so you can infer that Amy’s rating for Cat would be the same as her rating for Hedgehog.

Finally, try the content-based approach. This approach doesn’t require the ratings of community members. Instead, an expert has labeled the data—in this case, the attributes (cute, clean, cuddly, loyal) of each pet type. If you know an individual’s preference for each attribute, you can use the cosine similarity function to predict the pets that the individual is most likely to enjoy. In this example, Amy is most likely to enjoy, in order of descending preference, Hedgehog, Rabbit, Dog, Pig, then Cat.

Let’s get into the math a bit. As an example, to determine Amy’s score for Hedgehog, you find the similarity between Hedgehog’s pet attributes and Amy’s ratings of importance of pet attribute:

  • The Hedgehog’s vector is (4,3,1,1)
  • Amy’s vector is (3,3,2,1)
  • You need to find similarity between these two vectors
  • Cosine similarity = [4(3) + (3)(3) + (1)(2) + (1)(1)] / [SQRT(4^2 + 3^2 + 1^2 + 1^2) * SQRT(3^2 + 3^2 + 2^2 + 1^2)] = .96

For the collaborative method, you use Pearson equation because it normalizes ratings across users (who can be inconsistent in their ratings). If you have objective ratings (such as ratings not based on people with different scales), you can use cosine similarity. Figure 23 shows the solution.

Here are the variables in the equations:

  • u: user
  • i: item to be rated
  • N: # nearest neighbors
  • j: neighbor
  • rj,I: j’s rating on i
  • rj bar: average of j’s ratings
  • ru bar: average of user’s rating
  • alpha: scaling factor for ratings; 1 means use as is (there is no right value for alpha; it’s one of those hyperparameters) described earlier that an experienced data scientist can adjust to derive better results given the problem objective and context)
ai slide 23 Trace3

Figure 23: Recommender example: similarity solution.

Example: Lazy algorithm using support vector machine (SVM)

Finally, here’s an example of a lazy machine learning algorithm called the support vector machine (SVM). In this approach, you want to determine which group an item belongs, such as whether a new customer ends up being a highly or lowly profitable customer. To accomplish this using SVM, you need to calculate two parameters:

  • Weights (importance) of each attribute (examples of attributes might be the customer’s income, number of family members, profession, and educational achievement)
  • Support vectors, which are the data sets that are nearest to the curve (called a hyperplane) that separates the groups.

You then take these two parameters and plug them into an equation, as Figure 24 shows.

The way you calculate these parameters is to use the available data sets, and this is what is referred to as training the data.

ai slide 24 Trace3

Figure 24: Classification example: SVM problem and intuition.

Figure 25 shows the equation used to make the prediction under the Prediction label. Values that are calculated during the training phase are:

  • The weights (the alphas and thetas) used to minimize the cost function.
  • The support vectors xi, which are a subset of the training data.

Once the model is trained, you can then plug in new values of x (such as the attributes of new customers), and then predict the class, h(x), in which these new values of x belong (such as whether they are expected to be highly profitable customers or not).

ai slide 25 Trace3

Figure 25: Classification example: SVM solution.

Why AI projects fail

There are common ways that AI projects fail in the business environment, as Figure 26 shows. Any AI framework should address those pitfalls.

ai slide 26 Trace3

Figure 26: Why AI projects fail.

The first driver of failure is either selecting the wrong use case or taking on too many uses cases without sufficient capabilities and infrastructure. You can use the criteria described earlier to identify problems that better lend themselves to AI solutions. In addition, it is smart to set up a series of use cases that let capabilities and knowledge be built incrementally and with increasing technical sophistication.

Selecting the right use cases is best done collaboratively with:

  • Line-of-business staff who know the business problems, contexts, and constraints, as well as the hypotheses they want tested.
  • Business analysts who can ask questions that clarify the business intent and requirements, and who can identify the data sources and transformations.
  • Data scientists who can formulate the machine learning and deep learning problem so that models can provide answers to the business’s hypotheses.
  • Data engineers and IT resources who can provide access to the data.

Organizing and orchestrating these types of activities correctly upfront requires experienced cross-functional leaders who understand and can balance business impacts, operational drivers, workflow barriers and opportunities, data needs and constraints, and technology enablers.

The second driver is incorrectly building the AI models themselves. This consist of two elements:

  • Even though data science, like other sciences, is experimental in nature (you don’t really know what the data will tell you until you work with it), the approach to data science should be well-defined, should be disciplined, and should speed time-to-value.
  • Good data scientists can quickly experiment and iterate, learn from their experiments, distinguish between promising and ineffective approaches, and research and adapt cutting-edge methods if necessary. Good data scientists build MVPs (minimal viable products) in rapid, parallel fashion.

The third driver is lack of scale to quickly build and improve multiple AI models simultaneously. Frequently, this comes down to data scientists being able to work collaboratively, to reuse data pipelines, workflows, and models/algorithms, and to reproduce model results. Additionally, they need to be able to capture and quickly incorporate operational feedback (in the test, staging, or production environments) to further build scale. Accomplishing this requires both the correct infrastructure environment as well as a right-touch model governance approach.

The fourth driver of failure is an inability to operationalize and monetize AI models. Generally speaking, AI models are developed for one of two purposes:

  • To find previously unidentified insights
  • To automate decision making (for both cost reduction and efficiency/productivity).

Clearly, models that never make it out of the laboratory can’t accomplish these tasks.

Furthermore, not only do the models need to be deployed (that is, made accessible to people or systems), but they must be incorporated into workflows in such a way that they are “used” in operations, and exceptions (such as when models cannot make decisions with high probability of correctness) must be managed gracefully (such as through human intervention, model retraining, and model rollback). AI operationalization and monetization requires gradual but full model workflow integration, monitoring of data inputs and model performance parameters, and management of frequent model deployments.

How do I AI? An end-to-end AI solution framework

Finally, let’s tie all this together with an example AI solution framework shown in Figure 27.

ai slide 27 Trace3

Figure 27: AI reference architecture.

There are four components:

  • Data management.
  • Model development.
  • Model operationalization.
  • Ensuring that the models are used, affect the business, and improve business metrics.

The first component, data management, is a normal part of current BI environments, so I don’t describe it here.

The second component, model development, consists to two broad areas:

  • Defining and prioritizing use cases that are appropriate for machine learning models.
  • Building the machine learning models at scale.

The third component, model operationalization, not only entails model deployment but also the process of continuous retraining and redeployment, model integration with operational workflows, and integration of operational feedback for model improvement. 

The purpose of all of this is to monetize the models’ capabilities.

Finally, the fourth component, organization and business impact, is simple (and obvious) but vital to the future maturation of an organization’s AI capabilities. The function of this component is to ensure that the AI models are actually used by the lines of businesses (that is, they trust them and derive value from them) and that they are affecting business outcomes. Without line-of-business buy-in, the AI movement will rarely take flight.

Above these four components in Figure 27 are collaboration groups: IT, data engineers, data scientists, and lines of business. AI is a team sport.

You can take these components and put a reference architecture around them (see Figure 28), adding a component called model governance to ensure that model reproducibility, data science reusability, and data scientists’ collaboration are achieved and to ensure that model retraining/rollback is possible when required.

ai slide 28 Trace3

Figure 28: AI reference architecture.

Designing and implementing a solution like this reference architecture will support the AI solution framework with robustness, speed to market, and business outcomes.

Copyright © 2019 IDG Communications, Inc.

1 2 3 4 Page 4
Page 4 of 4
How to choose a low-code development platform