Explainable AI: Peering inside the deep learning black box

Why we must closely examine how deep neural networks make decisions, and how deep neural networks can help

Explainable AI: Peering inside the deep learning black box
Murat Göçmen / Getty Images

The claim that artificial intelligence has a “black box” problem is not entirely accurate. Rather, the problem lies primarily with deep learning, a specific and powerful form of AI that is based on neural networks, which are complex constructions that mimic the cognitive capabilities of the human brain.

With neural networks, the system’s behavior is a reflection of the data the network is trained against and the human labelers who annotate that data. Such systems are often described as black boxes because it is not clear how they use this data to reach particular conclusions, and these ambiguities make it difficult to determine how or why the system behaves the way does.

Explainability, then, is the ability to peek inside this black box and understand the decision-making process of a neural network. Explainability has important implications as it relates to the ethical, regulatory, and reliability elements of deep learning.

A good example of the black box problem mystified one of our clients in the autonomous vehicle space for months. Minute details notwithstanding, this company encountered some bizarre behavior during the testing of a self-driving car, which began to turn left with increasing regularity for no apparent reason. The designers of the system could make no sense of the behavior.

After months of painful debugging, using DarwinAI’s Generative Synthesis technology, the system’s architects finally uncovered the cause of the problem—the color of the sky. Because the training for certain turning scenarios had been conducted in the desert when the sky was a particular hue, the neural network had established a correlation between turning left and the lighting conditions. But due to the opaque nature of deep learning, the engineers had no way of uncovering such behavior except to test it in the real world and chance upon the problem. 

As illustrated by this little anecdote, the black box problem makes it difficult to construct reliable and robust neural networks, which can be especially crucial for mission-critical applications when lives are at stake. Specifically, if you don’t know how a neural network reaches its decisions, you don’t know when it will fail. And if you don’t know when it will fail, you can never be sure you’ve eliminated biased or catastrophic edge cases in the neural network’s behavior.

Technical benefits of explainable AI 

From a technical perspective, the ability to peek inside the inner workings of a neural network and demystify its behavior provides three main benefits to architects and developers:

Detecting and eradicating edge cases and data bias

Insight into how a neural network is reaching its decisions would allow developers to detect problematic boundary cases in the system and also identify biases in the data used to train it. 

For instance, in a well-documented example, an image classification network became extremely adept at identifying horses. The system was the pride of its designers, until the key to its effectiveness was uncovered: Because pictures of horses are frequently copyrighted, the network was searching for the © symbol to classify these animals. Inventive, to be sure, but an accident waiting to happen.

In a second example, the COPAS Parole Algorithm received negative press in 2016 when it was discovered that software predicting the future of criminals was biased against African Americans. The reason? The system was trained using historical data, and thus mirrored prejudices in the judicial system. In another recent and widely publicized example, a recruiting tool created by Amazon began favoring male candidates over female candidates as a result of the historical data it was fed.

In these scenarios, explainability would allow designers to identify such problems during model development and eliminate problematic triggers (e.g., race, copyright symbols, color of the sky) from consideration in the decision process. 

Improving model accuracy and performance

Illuminating deep learning’s black box can also help to improve model accuracy and performance.

Quite often, the process of designing a deep neural network begins by choosing a popular public reference model (such as Inception or MobileNet YOLO), which is then extended and trained for specific tasks. In such scenarios, a developer often doesn’t understand which parts of the prevailing network are most critical to the tasks at hand. 

Explainability at the technical level – the extent to which specific layers and even individual neurons are involved in a particular task – would allow a developer to modify parts of the network to improve accuracy and remove extraneous components. The latter has the additional benefit of potentially improving performance and facilitating faster inference. 

Reducing the amount of labeled data required to (effectively) train the network.

A final technical benefit of explainability is its potential to reduce the amount of labeled data (which often must be purchased) required to train a neural network for specific use cases. An example will help to illustrate the point.

Oftentimes, a model will perform poorly for a specific use case. For example, the perception network for one of our autonomous vehicle clients was particularly bad at detecting bicycles. Throwing more data at the problem typically rectifies such a shortcoming. For example, retraining the model with 50,000 additional bicycle images should do the trick.

Explainability could more precisely characterize such shortcomings. For example, it could reveal that the neural network is bad at detecting bicycles in this orientation at this time of day. In this way, the amount of labeled data required to correct the problem could be dramatically reduced (e.g. by purchasing images with bicycles in this particular orientation and this time of day).

Explainable AI for business stakeholders

An important point to note is that explainability can mean very different things to an engineer versus a business user, and that the granular data points communicated to the engineer would have to be translated into a higher-level representation that makes sense to the business person. 

It is therefore useful to consider the black box problem as it relates to non-technical individuals. Specifically, the business stakeholders who ultimately use and support such systems.  

In the main, the inscrutability of neural networks presents two challenges for resources above the technical food chain. First, identifying the risk and bias in such models is extremely challenging, as they are opaque. Second, the incomprehensibility of neural network models makes it difficult to extract concrete business insights from the results they produce even when they function correctly.  

Uncovering data bias and problematic correlation in deep learning

A deep learning system is only as good as the data upon which it is trained. It is not uncommon for a neural network to produced biased results based on skewed data sets.

As mentioned, the COPAS Parole algorithm became the subject of close scrutiny when it started producing biased results against African Americans. Likewise, the aforementioned autonomous vehicle example illustrates the dangers when AI draws nonsensical correlations among the data it is given.

Explainability would allow business stakeholders to more readily and explicitly address such problems. This would not be done at a technical level, as in the case of their engineering colleagues, but in their particular domain of expertise. In the first case, for example, a legal administrator could eliminate race as a determining factor in parole sentencing after an explainability tool identified its usage.

Applying risk management to deep learning models

The black box problem presents a second challenge to business stakeholders with respect to risk management. Just as with developers, the opaqueness of models makes it difficult for a business stakeholder to understand the strengths and limitations of an AI algorithm. As a result, it becomes difficult for the business to construct mitigation strategies to compensate for such shortcomings. 

Explainability would allow stakeholders to construct preventive measures through competing AI systems, more traditional algorithmic approaches, or non-AI and manual processes. It would allow the business to better gauge and manage risk, as business managers would have a more quantifiable understanding of how the AI reached a decision.

Gaining business insights and improving processes

High-level insights about how neural networks are reaching their decisions can do more than facilitate compensatory measures. They can also give a business stakeholder visibility into new and non-obvious correlations that can be used to improve existing business processes.

Recently, for example, an AI travel system proved remarkably adept at predicting the hotels a given customer would prefer. One of the correlations the neural network identified—one never appreciated by human operators—was the hotel’s proximity to a certain street corner. This variable when combined with other considerations was a large influencing factor on hotel preference.

The important point is that identifying this variable through explainable AI allowed the business to modify its process in a productive fashion—in this case, exposing proximity to this street corner to the customer as an explicit choice.

In sum, explainable AI allows businesses to improve and strengthen their own processes by leveraging new correlations that are uncovered by deep learning. As neural networks excel at this task, explainability can provide significant potential value in such areas. 

Bringing explainability to deep neural networks

To review, the black box problem exists on a technical level because of the sheer complexity of deep neural networks. With hundreds of layers and millions (sometimes billions) of parameters, it is simply not plausible for a human to unravel the inner workings of deep neural networks and understand how they make their decisions.

One-way to combat this complexity—the approach we use at DarwinAI—is, ironically, using AI itself.

Generative Synthesis, our core technology for deep learning design, is the byproduct of years of research by our academic team, whose previous accolades include two award-winning papers at workshops at the Neural Information Processing Systems (NIPS) conference in 2016 and 2017, respectively. 

Specifically, Generative Synthesis uses a neural network to probe and understand a neural network, identifying and correcting inefficiencies in the network that would be impossible for a human to uncover. It then generates a number of new and highly optimized versions of that neural network. More importantly, the understanding garnered by Generative Synthesis during this process enables multiple levels of explainable deep learning. Below are some examples of the types of explainability that our Generative Synthesis technology facilitates.

Root cause analysis of neural network predictions

One way of understanding a neural network’s behavior is through root cause analysis, which involves identifying those input factors that most influence a particular decision. This technique is illustrated below.

digit classification DarwinAI

Figure 1. Generative Synthesis sheds light on the classification of handwritten digits by a deep neural network.

Figure 1 illustrates a relatively simple task for a neural network, that of classifying a handwritten numerical digit. In this instance, the Generative Synthesis technology highlights the specific areas that most influence a classification. In the case of the “nine” and “three” digits on the left, the reasoning is clear as evidenced by the red and blue highlighted areas in each image, respectively. These areas of the image were most responsible for the neural network classifying the digits in this way.

The real benefit of explainability, however, is illustrated by the thornier case of the “seven” digit on the right.  In this case Generative Synthesis is not only able to identify where the neural network got it right (highlighted yellow area), but it is also able to pinpoint areas where the network got it wrong (red and blue areas), which, under different circumstances, might have resulted in an incorrect prediction (wrong digit classification).

Understanding such nuances—the legitimate and problematic pathways behind a deep neural network decision—becomes especially powerful in more complex scenarios. A good example is illustrated in Figure 2.

image classification DarwinAI

Figure 2. Generative Synthesis performs root cause analysis of an image classification error.

In this case, the neural network incorrectly classified this image as a hammer—a mystifying choice to most human beings. Using root cause analysis, Generative Synthesis is able to locate the areas of the image that most influenced this prediction (via the bounded box above). This visualization illuminates matters, as the junction of the corner of the bench and its leg does plausibly look like a hammer.

In some sense, root cause analysis of deep learning is not unlike introspection and debugging of classical computer code. Both allow an engineer to chart and diagnose the underlying causes of problematic behavior in order to correct it.   

Explaining neural network performance

1 2 Page 1
Page 1 of 2
InfoWorld Technology of the Year Awards 2023. Now open for entries!