3 kinds of bias in AI models — and how we can address them

A biased AI model must have learned a biased relationship between its inputs and outputs. We can fix that.

3 kinds of bias in AI models — and how we can address them

Automated decision-making tools are becoming increasingly ubiquitous in our world. However, many of the machine learning (ML) models behind them — from facial recognition systems to online advertisements — show clear evidence of racial and gender biases. As ML models become more widely adopted, special care and expertise are needed to ensure that artificial intelligence (AI) improves the bottom line fairly.

ML models should target and eliminate biases rather than exacerbate discrimination. But in order to build fair AI models, we must first build better methods to identify the root causes of bias in AI. We must understand how a biased AI model learns a biased relationship between its inputs and outputs.

Researchers have identified three categories of bias in AI: algorithmic prejudice, negative legacy, and underestimation. Algorithmic prejudice occurs when there is a statistical dependence between protected features and other information used to make a decision. Negative legacy refers to bias already present in the data used to train the AI model. Underestimation occurs when there is not enough data for the model to make confident conclusions for some segments of the population.

Let’s delve into each of these. 

Algorithmic prejudice

Algorithmic prejudice stems from correlations between protected features and other factors. When this happens, we cannot reduce bias simply by removing the protected characteristics from our analysis because the correlation may lead to biased decisions based on non-protected factors. 

For example, early predictive policing algorithms did not have access to racial data when making predictions but the models relied heavily on geographic data (e.g. zip code), which is correlated with race. In this way, models that are “blind” to demographic data like gender and race can still encode this information through other features that are statistically correlated with protected attributes.

The Consumer Financial Protection Bureau, which works to ensure that lenders comply with fair lending laws, has found statistical methods that combine geography and surname-based information into a highly reliable proxy probability for race and ethnicity. These findings refute the prevalent misconception that an algorithm will automatically be less biased if it isn’t given access to protected classes. This phenomenon, known as proxy discrimination, can be mitigated once the root cause is identified. That is, violations can be repaired by locating intermediate computations within a model that create the proxy feature and replacing them with values that are less correlated with the protected attribute. 

Counterintuitively, the naive solution of removing protected features from model training can actually hurt already disadvantaged groups in certain cases. In the US judicial system, for instance, correctional authorities and parole boards use checklists of risk factors to make fair decisions about incarceration and release. When both humans and AI models have basic information like gender, age, current charge, and number of prior adult and juvenile offenses, humans and models perform comparably. 

However, by giving both humans and models 10 additional risk factors related to education and substance use, researchers found that machine learning models are more accurate and less prone to bias. This underscores the need to understand the root cause of an AI model’s bias instead of blindly employing remediation strategies. 

Negative legacy

It’s also possible that an algorithm’s bias stems directly from an analogous bias present in its training data. For instance, ML models trained to perform language translation tasks tended to associate female names with attributes like “parents” and “weddings,” while male names had stronger association with words like “professional” and “salary.” It is unlikely that the model is picking this association up on its own; rather, it is trained on a corpus of text that reflects these gender tropes. This is an example of negative legacy

Within natural language processing, gender bias is a troubling but well-studied problem: A clear understanding of its cause presents avenues to correct it. In languages like English where nouns and adjectives tend not to be gendered, researchers have found ways to enforce word embeddings to remain gender-neutral. In other cases where language is inherently gendered, language corpora can be augmented to prevent bias by introducing new examples that break causal associations between gendered and gender-neutral words. 

In other application areas, negative legacy can be one of the hardest types of bias to mitigate, as bias is inherently built into the dataset that the machine learning model learns from. As such, the model can codify years of systemic bias against a population. Redlining, for example, or systematically denying loans to people based on where they live, can bias loan approval datasets towards whites. This bias in the data then leads to biased behavior of the AI model. 

Although existing bias mitigation strategies might try to boost credit acceptance rates for Black applicants, this might obscure the true cause of the model’s bias and make it difficult to address the underlying issue. FICO scores, commonly used as inputs in credit decisions, have been shown to exhibit racial discrimination. In this case, post-hoc bias mitigation strategies would be less effective than seeking out alternative data sources that also exhibit causal connections to credit worthiness. Thus negative legacy could be mitigated by finding alternative data.


Just as data can be biased, it can also be insufficient. Without enough data, machine learning models can fail to converge or provide reliable predictions. This is the problem of underestimation. Amazon recently trained a machine learning model to screen applicants in its hiring process, but like many other tech companies, Amazon has a disproportionately male workforce. This data imbalance made its AI model more confident when evaluating men, leading to stronger recommendations for male applicants. Recognizing the bias in recommendations made by the model, Amazon scrapped this model from their recruiting pipeline. 

Amazon may have been able to build an unbiased recruiting tool had they sought out more or better data, but without a proper understanding of why the bias arose, this would have been impossible. In the case of underestimation, a model’s certainty of its predictions can be analyzed across subgroups of the population, and the underlying dataset can be diversified by automatically augmenting it with new instances.

Measures of model certainty and stability on a population are critical to understanding whether a model is even prepared to make credible predictions for all groups of people. In the case of underestimation, the dataset provided isn’t sufficiently expressive to capture the nuances of the data. However, adversarial training techniques to promote fairness or post-hoc bias mitigation strategies will likely not be as successful as augmenting the dataset to be more comprehensive.

It is no secret that algorithms can encode and perpetuate bias, and this can have devastating consequences. But while this paints a stark picture, it is important to remember that algorithmic bias (unlike human bias) is ultimately quantifiable and fixable if dealt with appropriately. Instead of taking a blind approach to reducing AI bias, a precise understanding of the true causes behind bias is essential to deploying safe and trustworthy AI.

While these causes are complex, researchers continue to develop better ways to measure disparate outcomes for specific groups, identify specific features that cause these differences, and choose reasonable mitigation strategies for specific sources of bias. As more decisions are automated, we must combat bias at its roots in order to create fair and equitable models.

Anupam Datta is a professor of electrical and computer engineering at Carnegie Mellon University and chief scientist of Truera. Divya Gopinath, research engineer at Truera, Mesi Kebed, engineer at Truera, Shayak Sen, chief technical officer, at Truera, and John C. Mitchell, professor of computer science and electrical engineering at Stanford University, contributed to this article.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Copyright © 2021 IDG Communications, Inc.