What is deep reinforcement learning: The next step in AI and deep learning

Reinforcement learning is well-suited for autonomous decision-making where supervised learning or unsupervised learning techniques alone can’t do the job

Become An Insider

Sign up now and get FREE access to hundreds of Insider articles, guides, reviews, interviews, blogs, and other premium content. Learn more.

Reinforcement learning has traditionally occupied a niche status in the world of artificial intelligence. But reinforcement learning has started to assume a larger role in many AI initiatives in the past few years. Its application sweet spot is in calculation of optimal actions to be taken by agents in environmentally contextualized decision scenarios.

Using trial-and-error approaches to maximize an algorithmic reward function, reinforcement learning is well suited to many adaptive-control and multiagent automation applications in IT operations management, energy, health care, commerce, finance, transportation, and finance. And it’s being used to train the AI that powers both its traditional focus areas—robotics, gaming, and simulation—and a new generation of AI solutions in edge analytics, natural language processing, machine translation, computer vision, and digital assistants.

Reinforcement learning is also fundamental to the development of autonomous edge applications in the internet of things. Much of edge application development—for industrial, transportation, health care, and consumer applications—involves building AI-infused robotics that can operate with varying degrees of contextual autonomy under dynamic environmental circumstances.

How reinforcement learning works

In such application domains, edge devices’ AI brains must rely on reinforcement learning, in which, lacking a pre-existing “ground truth” training data set, they seek to maximize a cumulative reward function, such as assembling a manufactured component according to a set of criteria included in a spec. This is in contrast to how other types of AI learn, which is either by (as with supervised learning) minimizing an algorithmic loss function with respect to the ground truth data or (as with unsupervised learning) minimizing a distance function among data points.

To continue reading this article register now