6 ways to make machine learning fail

When learning, machine learning will make mistakes. Adopters need to anticipate that—and be careful not to make matters worse through human mistakes by IT and business

The process of learning in general often means making mistakes and taking the wrong paths, and then figuring out how to avoid these pitfalls in the future. Machine learning is no different.

As you implement machine learning in your enterprise, be careful: Some of technology marketing might suggest that the learning is very right very fast, an unrealistic expectation for the technology. But the truth is that there are bound to be mistakes in the machine learning process. And these mistakes can get encoded, at least for a while, in business processes. The result: Those mistakes now happen at scale and often outside immediate human control.

“Eagerness without due diligence can lead to problems that render the benefits of machine learning almost useless,” says Ray Johnson, chief data scientist at SPR Consulting.

Detecting machine learning errors and dealing with them will help you have more success with the technology and meet your machine learning expectations.

Following are some of the issues that can increase and prolong the mistakes that machine learning tools make while they are learning—bad lessons they may never recognize and correct.

Lack of business understanding of the problem makes machine leaning fail

Some data workers using machine learning models don’t really understand the business problem that machine learning is trying to solve, and this can introduce errors into the process.

When his team is using a machine learning tool, Akshay Tandon, vice president and head of strategy and analytics with financial services site LendingTree, encourages it to start with a hypothesis statement. The statement should ask what the problem is you’re trying to solve, and what the models are you going to build to address that problem.

From a statistics side, machine learning tools available today are extremely powerful, Tandon says. That places a higher burden for doing it right, because these powerful tools, if not carefully used, can lead to bad decisions that matter. If data analysis teams are not careful, they can end up with models that do not fit with the particular data the team is using of what it’s trying to learn. Rapid deterioration results; things can go very wrong very quickly, he says.

In addition, many business users don’t understand that a model, from the minute it’s put into production, has a certain degradation in quality, Tandon says. Recognizing that, like with a car or any other machine, users need to constantly monitor it and be mindful of how it’s affecting decisions.

Poor data quality can cause machine learning errors

Garbage in, garbage out. If the data quality is not sufficient, machine learning will suffer. Poor data quality is one of the biggest concerns of data managers, and it can jeopardize big data analytics efforts despite the best intentions of data scientists and other professionals who work with information. It can certainly drive machine learning models off the rails.

Organizations frequently overestimate the resiliency of machine learning algorithms and under estimating the effects of bad data. Poor data quality produces bad results and lead to an organization making poorly informed business decisions, Johnson says. The results of these decisions will hurt business performance and make it difficult for future initiatives to get support.

You can detect poor data quality from machine-learning-driven results that just don’t seem to make sense, based on past and current experience.

A proactive approach to addressing the problem is exploratory data analysis (EDA), Johnson says. EDA can identify basic data quality issues such as outliers, missing values, and inconsistent domain values. You can also use techniques such as statistical sampling to determine if there are sufficient instances of data points to adequately reflect population distribution, and to define rules and policies regarding data quality remediation.

Incorrect use of machine learning

“The most common problem we still see from companies is the desire to use [machine learning] for no other reason other than it is in vogue,” says Sally Epstein, a specialist machine learning engineer with consulting firm Cambridge Consultants. But it must be the right application of the tool to be successful, she says. And traditional engineering approaches might provide a solution faster and for considerably less cost.

Using machine learning when it might not be the best choice for solving a problem and not fully understanding the use case can result in resolving the wrong problem, Johnson says.

In addition, addressing the wrong problem will lead to lost opportunities, as organizations struggle to tailor their use case to a specific, ill-fitting model. This includes wasted resources that are deployed in terms of personnel and infrastructure to obtain a result that could have been realized using simpler alternative approaches.

To avoid the incorrect use of machine learning, consider the desired business outcome, the complexity of the problem, the data volume, and the number of attributes. Relatively simple problems such as classification,clustering, and association rules using small amounts of data with a few attributes can be approached visually or via statistical analysis, Johnson says. In those cases, deploying machine learning can take more time and resources than are needed.

When the volume of data becomes unwieldy, machine learning might be more appropriate. But it’s not uncommon to go through a machine learning exercise and then find that the business outcome has not been clearly defined, resulting in the resolution of the wrong problem.

Machine learning models can be biased

Using a poor-quality data set can lead to misleading conclusions. Not only can it introduce inaccuracies and missing data, but it can introduce biases as well. People are certainly capable of bias, so it stands to reason that the models created or inspired by people can also contain biases.

Each machine learning algorithm has different sensitivities to imbalanced classes or distributions, Epstein says. If these aren’t addressed, you might end up with, for example, facial recognition tools that have dependencies on skin color or produce models with gender bias, Epstein says. In fact, that’s happened already with several commercial services already.

The accuracy of a conclusion—whether that of an algorithm or a person—depends on the breadth and quality of the information being processed. The financial, legal, and reputational risks of algorithmic bias that organizations and individuals face is an example of why any company using machine learning should make ethics an organizational imperative, says Vic Katyal, principal of the advisory analytics service area at consulting firm Deloitte.

Signs of algorithmic bias have been well-documented in the public sphere across areas such as credit scoring, education curriculums, hiring, and criminal justice sentencing, Katyal says. Poorly collected, curated, or applied data can introduce bias in even the most well-designed and well-intended machine learning applications.

Inherently biased machine learning systems threaten to disadvantage segments of customers or societal stakeholders, and can create or perpetuate unfair outcomes, he says.

Consulting firm McKinsey & Company notes in a 2017 report that algorithmic bias is one of the biggest risks of machine learning, because it compromises the actual purpose of machine learning. It’s an often-overlooked defect that can trigger costly errors, the firm says, and if left unchecked can pull projects and organizations in entirely wrong directions.

Effective efforts to confront the problem at the outset will pay off handsomely, McKinsey says, allowing the true potential of machine learning to be realized most efficiently.

Insufficient resources to do machine learning well

When launching a machine learning initiative, organizations can easily underestimate the resources they need for personnel and infrastructure. There can be substantial infrastructure requirements for machine learning, especially in the cases of image, video, and audio processing.

Without the required processing power, developing machine-learning-based solutions in a timely fashion might be difficult at best, if not impossible, Johnson says.

There is also the issue of deployment and consumption. What good is developing a machine learning solution if the prerequisite infrastructure is not in place to allow for its deployment and the consumption of results by users?

Deploying a scalable infrastructure to support machine learning can be expensive and difficult to maintain. However, there are several cloud services that provide scalable machine learning platforms that can be provisioned on-demand. The cloud approach allows experimentation with machine learning at scale without the shackles of physical hardware acquisition, configuration, and deployment, Johnsons says.

Some organizations want to have their infrastructure in house. If that’s the case, cloud services can serve as a stepping stone and education experience, so those organizations can understand what’s required from an infrastructure perspective before they make that large investment.

From a personnel perspective, the lack of knowledgeable resources such as data scientists and machine learning engineers can derail the development and deployment of machine learning. It’s essential to have resources that understand machine learning concepts, its application, and interpretation to determine if specific business outcomes are being achieved.

It can’t be understated how important it is to have knowledgeable machine learning skills, Johnson says. Knowledgeable people can help identify data quality issues, ensure proper use and deployment of machine learning tools, and help establish best practices and governance policies.

Poor planning and lack of governance derail machine learning

Machine learning efforts might start off with enthusiasm but then lose momentum and grind to a halt. This is a sign of poor planning and lack of governance.

Machine learning efforts would continue ad infinitum if appropriate guidelines and limits are not put into place, potentially resulting in enormous resource expenditures without achieving any benefits, Johnson says.

Organizations need to keep in mind that machine learning is an iterative process, and modifications to models might happen over time to support changing requirements. As a result, the people working with machine learning might develop a lack of interest in completing the effort, which can lead to poor results. Project sponsors might move on to other endeavors, and the machine learning effort will eventually stall.

Machine learning efforts need to be monitored on a regular basis to keep things moving along, Johnson says. If progress start slowing down, it might be time to take a break and re-examine the effort.

Copyright © 2018 IDG Communications, Inc.

How to choose a low-code development platform