In a world of pervasive AI, testing will be a nightmare and metrics will be key

The sheer amount of data available today demands enterprises automate and make more decisions with artificial intelligence

artificial intelligence / machine learning / network
Thinkstock

Artificial intelligence’s inroads into the enterprise is fueled by availability of more diverse data. More interesting data is available in higher volumes because the number of systems, applications, processes and interfaces that have been instrumented is increasing. This availability of data is leading enterprises into an unprecedented phase of enterprise automation.

In this phase, enterprises will integrate more predictive decisions in their processes. These decisions will be powered by one or more AI models. Probabilistic decision making will prove to be a huge boon for the enterprise. However, the introduction of probabilistic decision making will unleash a new level of quality and testing challenges for the enterprise. It will force a marked shift in how the industry performs quality assurance and how test metrics are designed and generated.

New quality considerations

Same data, multiple AI scenarios

In this scenario, the same data set is used to power AI models in multiple business problems and scenarios. A single data set can contain multiple embedded signals. Different AI scenarios can leverage these embedded signals to drive different types of probabilities and outcomes.

Same data, multiple models, same AI scenario

In this scenario, the same data is used to generate multiple AI models using different AI techniques to power the same AI scenario/business problem. Different algorithms and techniques leverage the embedded signals and structure of the data in different ways to produce AI models that consequently can behave very differently.

Transformed data, multiple AI scenarios

In this scenario, a data set is transformed through several ETL mechanisms to power very different AI scenarios/business problems. The transformation of the data can vary between the following:

  • Sampling: A subset of the data set is leveraged where the subset can be but does not have to be generated randomly.
  • Filtering: The training data set is designed to include or exclude certain types of rows or signals.
  • Projections: The training data set is designed to include a subset of attributes available in the data set.
  • Aggregations: The training data set is built through aggregations, across a particular set of attributes or over time.
  • Derivations: The training data set is built through one or more attribute level transformations such as string to integer, integer to categorization, binning, etc.

AI daisy chains

In this scenario, multiple AI models are built and are connected to each other either digitally or through an analog, human-powered connection. For example, users can use the output of an AI model to determine an outcome. They can enter the outcome into a business workflow and potentially a second AI model, or a person can use the outcome of the first AI model to determine the next output. In this scenario, the quality of the outcome of the second AI model can vary depending on the quality of the first AI model’s outcome.

Testing best practices

Enterprise-wide data transformation map

Enterprises need to ensure that they build and maintain a comprehensive, enterprise-wide data transformation map. This enterprise-wide data transformation map should describe how data is taken from raw data sources, transformed and fed into AI models.

Having an enterprise-wide data transformation map makes it easy and seamless to determine the provenance of AI models. This is required to determine the impact of upstream data quality issues to the AI model and on the business workflows that the AI model impacts.

Data transformation semantic profiling

Enterprises also need to invest to operationally and semantically profile their data transformations. Semantic data profiling can determine the patterns in the output data set and its structure that is generated after data transformations are applied to a raw data set.

Determining such patterns in the transformed version of the data can be used to profile a data transformation technique. When errors in the data or in its transformation techniques change the profile of the data transformation, alerts can be generated and the impact on the quality of downstream AI models can be estimated.

Ceiling and floor constraints in AI daisy chains

Enterprises should invest in AI workflow capabilities that enable ceiling and floor constraints on the usage of the output of an upstream AI model in a downstream business workflow. In addition, these constraints should be configurable and overridable while being closely monitored to ensure that the consumer of the output of an upstream AI model is able to understand and judiciously use the output.

AI test metrics

Enterprise test disciplines need to invest in AI test metrics that are capable of meticulously determining and testing the quality of not just individual transformations or AI models but the quality of entire AI-driven business workflows. In addition to low-level test metrics, test metrics need to include the measurement of whether the entire AI-driven business workflow is delivering on its goals and customer requirements.

Given the predictive nature of AI-driven workflows, the determination of a failure or suboptimal outcomes might not be evident until the entire workflow has been completed. Test metrics and systems that collect the data and generate such test metrics need to be instrumented to collect the final outcome of the business workflow to define and deliver comprehensive test metrics and quality determinations.