Mitigating the risks of the AI black box

If we don’t understand how machine learning works, how can we trust it? Increasing model transparency creates risks as well as rewards

Mitigating the risks of the AI black box

Enterprises are placing their highest hopes on machine learning. However machine learning, which sits at the heart of AI (artificial intelligence), is also starting to unnerve many enterprise legal and security professionals.

One of the biggest concerns around AI is that complex ML-based models often operate as “black boxes.” This means the models—especially “deep learning” models composed of artificial neural networks—may be so complex and arcane that they obscure how they actually drive automated inferencing. Just as worrisome, ML-based applications may inadvertently obfuscate responsibility for any biases and other adverse consequences that their automated decisions may produce.

To mitigate these risks, people are starting to demand greater transparency into how machine learning operates in practice and throughout the entire workflow in which models are built, trained, and deployed. Innovative frameworks for algorithmic transparency—also known as explainability, interpretability, or accountability—are gaining adoption among working data scientists. Chief among these frameworks are LIME, Shapley, DeepLIFT, Skater, AI Explainability 360, What-If Tool, Activation Atlases, InterpretML, and Rulex Explainable AI.

All these tools and techniques help data scientists generate “post-hoc explanations” of which particular data inputs drove which particular algorithmic inferences under various circumstances. However, as noted here, recent research shows that these frameworks can be hacked, thereby reducing trust in the explanations they generate and exposing enterprises to the following risks:

  • Algorithmic deceptions may sneak into the public record. Unscrupulous parties may hack the narrative explanations that these frameworks generate, perhaps for the purpose of misrepresenting or obscuring any biases in the machine learning models being described. In other words, “perturbation-based” approaches such as LIME and Shapley can be tricked into generating “innocuous” post-hoc explanations for algorithmic behaviors that are unambiguously biased.
  • Technical vulnerabilities may be disclosed inadvertently. Exposing information about machine learning algorithms can make them more vulnerable to adversarial attacks. Full visibility into how machine learning models operate may expose them to attacks that are designed either to trick how they make inferences from live operational data or to poison them at the outset by injecting bogus data into their training workflows.
  • Intellectual property theft may be encouraged. Entire machine learning algorithms and training data sets can be stolen based simply on their explanations alone, as well as through their APIs and other features. Transparent explanation of how machine learning models operate may enable the underlying models to be reconstructed with full fidelity by unauthorized third parties. Similarly, transparency may make it possible to partially or entirely reconstruct training data sets, which is an attack known as “model inversion.”
  • Privacy violations may run rampant. Machine learning transparency may make it possible for unauthorized third parties to ascertain whether a particular individual’s data record was in a model’s training data set. This adversarial tactic, known as a “membership inference attack,” may enable hackers to unlock considerable amounts of privacy-sensitive data.

To continue reading this article register now