Best practices for developing governable AI

Focus on these engineering best practices to build high-quality models that can be governed effectively.

Best practices for developing governable AI

Building and deploying strong, robust artificial intelligence (AI) and machine learning (ML) models is complex and challenging work. If you are like many data science and machine learning leaders that I have spoken to lately, you are having conversations with other teams about the governance of your systems.

It’s hard to do that and do your job of getting models into production. So let’s talk about what you can do as a technical organization to make AI governance easier both for your team and your business partners, who are key stakeholders in the governance process.

Key design principles

At a high level, to ensure that we have models that are governable and can be assured, we want to make sure model artifacts exhibit the following three principles:

  • Context: After the initial exploratory stages of model development, the business reasons, scope, risks, limitations, and data modeling approaches are well-defined and fully documented prior to a model going into production.
  • Verifiability: Every business and technical decision and step in the model development process should be able to be verified and interrogated. An ML model pipeline should never be a completely “black box” even if a black box algorithm is used. Understanding where the data came from, how it was processed, and what regulatory considerations exist are paramount for building a verifiable model. Model code should be constructed and documented in a way that is comprehensible to someone who hasn’t looked at the code before. The model should be built so that reperforming individual transactions is possible, using containerized architectures, serialization (via pickle, or equivalent), and preprocessing techniques that are deterministic (e.g., Scikit-learn one-hot encoding with a random seed and serialized).
  • Objectivity: The gold standard of governance is when any ML application can be reasonably evaluated and understood by an objective individual or party not involved in the model development. If an ML system is built with the prior two principles of context and verifiability, it is far more likely that your business partners can act effectively as that second-line and third-line objective party to evaluate it and greenlight your work to go into production.

Key capabilities to incorporate into models

Due to the ever-evolving landscape of open source libraries, vendors, and approaches to building ML models as well as the shortage of qualified ML engineers, there is a significant lack of industry best practices for creating deployable, maintainable, and governable ML models.

When developing ML models with governance in mind, the most important considerations are reperformance, version control, interpretability, and ease of deployment and maintainability.


Reperformance is the ability to reperform or reproduce a transaction or a model training and obtain identical results. Much has been said about the “reproducibility crisis” in science, and the AI/ML community is not immune from this criticism.

Creating machine learning systems that are reproducible is definitely possible, and putting in the effort up front to do so ultimately yields more robust, dependable deployments, fewer headaches for developers, and fewer questions from auditors and other reviewers.

Some key best practices to keep in mind:

  • Pipeline objects should be used to encapsulate the pre-processing functions (i.e., scaling, one-hot encoding, etc.), the model, and post-processing techniques (if applicable) into one object. This pipeline object should be saved in a common serialization such as pickle or joblib.
  • Pre-processing and post-processing logic that is not in the pipeline object should reside in a single .py file.
  • Use random seeds for model training, fitting, and processing to ensure consistent, repeatable results.
  • Use a version control system such as Git for all code storage.
  • Document, document, document your process.
  • Document your data lineage, provide a data dictionary, and understand exactly where your data came from and what it does.
  • Document how your model performs and why specific decisions were made on feature selection, engineering, and model training.


Creating an ML system that is interpretable, or understandable to non-experts, is a key component for creating a governable ML system. To create an interpretable model, the following are key considerations and best practices:

  • Simpler is often better, and you should avoid using more complex models without trying a simpler model first. In other words, don’t use a deep neural network if a logistic regression model performs almost as well. In cases when a more complex model is chosen, you should document and justify why such a model is required for the business use case.
  • Employ a common explainability technique such as Anchors or SHAP to your model. Ensure that the model supports local as well as global interpretability of individual transactions.
  • Ensure that your model has been evaluated independently for accuracy, business context, and understandability, and that it performs as expected when individual inputs are passed in.

Deployment maturity

As a technical organization, the final dimension of governability lies in maturing your approach to how you deploy models into production. Following standard engineering and API development best practices will go a long way toward governable AI. You will additionally want to focus on deploying a scalable model that is robust when facing adversarial inputs and shocks in request volume. Here are some specific practices to employ that might be relevant for your team:

  • Deploy and productionize all models using a standard process, such as in a container orchestration system. In particular, you should have a thorough peer review process with a special eye toward ensuring that software engineers have an opportunity to harden code created by data scientists, who may not have the same degree of experience with hardening code for production.
  • Encapsulate pre-processing and post-processing code in pipeline objects or single files for reproduction and auditability. Model serving should be separated into a server file that loads the pipeline object (or model and processing pickle files) and a Python file that has the pre-processing, model prediction, and post-processing logic.
  • Confirm all model inputs, results, explainability, and relevant metadata are logged in sufficient detail for post mortems and traceability of model transactions.
  • Adhere to a standard REST API deployment paradigm, ideally with a containerized solution with safeguards in place. You should avoid dynamic processes in the pre-processing and post-processing logic. If a call is deterministic, your model is not reproducible and therefore cannot be governed effectively.
  • Ensure that your application architecture and security are front and center when building a sustainable and trustworthy AI system. Your model, code, artifacts, and systems should adhere to the principle of least privilege and any other relevant security practices for your organization. You should also have strong access and security controls (IT general controls) in place to protect the system from tampering.
  • Verify that monitoring processes are appropriate and sufficient to provide timely identification if the model behaves unexpectedly. Model concept drift and feature drift are pervasive problems in deployed machine learning models. Having monitoring in place to detect when drift begins to occur is absolutely essential for a long-term successful ML deployment.
  • Confirm the model has been thoroughly and routinely tested—manually by an independent, non-technical party using standardized controls—to ensure that the model is performing as expected and is resistant to adversarial inputs. Periodic manual one-off testing and validation of the model are crucial to ensuring that the model is operating as intended.

A lot of work and focus over the past decade has been poured into pushing the boundaries of data processing and modeling algorithms. In industry, the biggest gap recently has not been a lack of ability to build machine learning models in Python, but how to properly govern and deploy such models, especially higher-risk models in highly regulated environments.

Focusing on these engineering best practices will go a long way toward providing the technical basics needed to build high-quality models that can be governed effectively. Most importantly, objective evaluators inside and outside the organization will have the ability to implement multiple lines of defense for your organization, drive model risk management, and enable audits that place a lower burden on technical teams.

The end result of building more governable AI will be to free your technical teams to focus on forward progress for their models because they have gained the trust of their business partners.

Andrew Clark is founding chief technology officer at Monitaur, an AI governance and ML assurance company. A trusted domain expert, Andrew built and deployed ML auditing solutions at Capital One and served as an economist and modeling advisor for several very prominent crypto projects at Block Science. He is currently a key contributor to ISO AI Standards, ISACA ML Auditing Guidance, and ICO AI Auditing Framework. Connect with Andrew on LinkedIn, and learn more about Monitaur at

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to

Copyright © 2021 IDG Communications, Inc.