Boosting AI’s smarts in the absence of training data

Zero-shot learning repurposes knowledge through statistical or semantic approaches without needing huge amounts of fresh training data

Boosting AI’s smarts in the absence of training data
Thinkstock

AI (artificial intelligence) is the most perfect field of dreams in modern culture. If you ask the average person on the street what AI runs on, they probably won’t mention training data. Instead, they might mumble something about computer programs that magically learn how to do useful stuff from thin air.

However, some of today’s most sophisticated AI comes close to that naïve dream. I’m referring to a still-developing approach known as “zero-shot learning.” This methodology—which is being explored at Microsoft, Uber, Baidu, Alibaba, and other AI-driven businesses—enables useful pattern recognition with little or no training data.

Zero-shot pattern learning will enable intelligent robots to dynamically recognize and respond to unfamiliar objects, behaviors, and environmental patterns that they may never have encountered in training. I predict that zero-shot approaches will increasingly be combined with reinforcement learning in order to enable robots to take the best actions iteratively in environments that are chaotic and one-off.

In addition, gaming applications will use zero-shot approaches such as iterative self-play as an alternative to training on voluminous data derived from successful gameplay. This will enable the training of agents to master complex winning strategies in spite of knowing nothing about these games at the outset.

Furthermore, zero-shot learning promises to make object recognition applications more versatile, due to its ability to drive:

  • on-the-fly recognition of rare, unfamiliar, and unseen objects that may be substantially missing from training data.
  • recognition of patterns for which it is hard to obtain training data that has been labeled with a sufficiently high degree of expert knowledge.
  • detection of instances of object classes where the proliferation of fine-grained categories has made it difficult or prohibitively expensive to acquire sufficient quantities of statistically diverse, labeled training data.

What makes zero-shot learning possible is the existence of prior knowledge that can be discovered and repurposed through statistical or semantic approaches. Zero-shot methods use this knowledge to predict the larger semantic space of features that encompasses both the seen instances (those in the training data) and the unseen instances (those missing from training data). Regarding automated knowledge discovery, some of the most promising technical approaches for zero-shot learning include:

  • Building classification models from statistical knowledge that was gained in prior supervised learning projects that are in distinct but semantically adjacent object-recognition domains (identifying a never-seen class of vertebrate based on features extracted from a related species).
  • Extracting semantic knowledge of target objects from textual descriptions of the targeted classes (crawled Web articles that describe the visual features of the species to be recognized).
  • Using word vectors and other graph approaches to refine inferences of the semantic features of the target classes from those of the source classes, given the availability of textual descriptions of the target classes.

Zero-shot learning can’t realize its potential as an AI pipeline accelerator unless data scientists acquire tools that provides simplified access to these techniques. That, in turn, requires deep learning toolkits that support easy visual design of new models from pre-existing functional building blocks under a larger paradigm known as “transfer learning.” This depends on workbenches that provide data scientists with reuseable feature representations, neural-node layerings, weights, training methods, learning rates, and other relevant features of prior models that can be quickly brought into zero-shot AI projects.

As zero-shot techniques gain adoption, and the pool of prior knowledge grows, developers of high-quality AI will grow less reliant on training data. During the next few years, we’ll see data scientists build more intelligent robotics, gaming, and pattern-recognition applications by configuring pre-existing statistical and semantic knowledge, without needing to acquire, prepare, and label huge amounts of fresh training data.

When that day arrives, more AI-based applications will be able to automate the bootstrapping of their intelligence from a state of pure ignorance to one of deep knowledge through techniques that are ad-hoc, zero-shot, and situationally adaptive.

That will mark the true beginning of artificial general intelligence, a dream that has motivated the AI community from the days of Alan Turing all the way to the present.

Copyright © 2020 IDG Communications, Inc.