Reinforcement learning comes into AI’s mainstream

Developers now have the tools to get started with this revolutionary technology that is poised to become mainstream

Reinforcement learning comes into AI’s mainstream
Getty Images

One of the most noteworthy artificial intelligence trends in 2018 has been the maturation of reinforcement learning into a mainstream approach for building and training statistical models to do useful things.

As I explained earlier this year, reinforcement learning is taking an expanding role in enterprise AI initiatives. The technique has broken out of its traditional niches in robotics, gaming, and simulation, and it is now evident in a wide range of cutting-edge AI applications in IT operations management, energy, health care, commerce, transportation, and finance. It’s even integral to a new generation of AI solutions in social media, natural language processing, machine translation, computer vision, digital assistants, and more.

To deepen the consumability of reinforcement learning algorithms in enterprise AI, developers require tools for collaborating on these projects and for deploying the resulting models into production environments. In that regard, there have been significant industry announcements recently that illustrate the maturation of open-source workbenches, libraries, and devops pipelines for reinforcement-learning-focused AI initiatives.

Iterative reinforcement-learning development workbench

Many advances in reinforcement learning are creeping into our lives either through mainstream apps we take for granted (like multiplayer online games) or use cases so futuristic (like robotics) that we don’t realize they’re creeping into the mainstream. Reinforcement-learning agents can now play games at a superhuman level, such as in the Open AI Five competition.

Developers can avail themselves of a growing range of open-source reinforcement learning frameworks for gaming and robotics, including OpenAI’s Roboschool, Unity Technology’s Machine Learning Agents, and Intel’s Nervana Coach. And you also have access to open source reinforcement-learning frameworks that are extensible to a wide range of challenges. For example, Google’s TensorFlow Agents supports efficient batched reinforcement learning workflows and UC Berkeley’s Ray RLLib provides a flexible task-based programming model for building agent-based reinforcement learning applications in TensorFlow and PyTorch.

What’s been lacking from many AI developers’ modeling toolkits is a fast, iterative reinforcement learning workbench that integrates with existing AI frameworks and is geared to a wide range of modeling and training challenges. To remedy this situation, Google recently announced Dopamine, a TensorFlow-based framework and code base for fast, iterative prototyping of reinforcement learning algorithms in Python 2.7. Dopamine, which tops GitHub's internal ranking of “cool open source projects,” supports the following core functions:

  • Development of reinforcement learning experiments on new research ideas: Dopamine includes compact, well-documented Python code that focuses on the Arcade Learning Environment (a mature, well-understood benchmark) and four value-based agents for executing in single-GPU environments: Deep Q-Networks (DQN), C51, a carefully curated simplified variant of the Rainbow agent, and the Implicit Quantile Network agent.
  • Obtaining reproducible results from reinforcement learning experiments: Dopamine includes a full test suite and implements a standard empirical framework for with the Arcade Learning Environment.
  • Benchmarking of reinforcement learning results against established training methods: Dopamine includes full training data for the four provided agents across the 60 games supported by the Arcade Learning Environment, available as Python files for agents trained with our framework and as JSON data files for comparison with agents trained in other frameworks, as well as a website for visualizing training runs for all provided agents on all 60 games.
  • Accelerators for reinforcement learning development teams on using the framework:Dopamine includes a set of colabs that clarify how to create, train, and benchmark reinforcement learning agents created in the framework. It also includes downloadable trained deep networks, raw statistics logs, and Tensorflow event files for plotting with Tensorboard.

Modular reinforcement-learning agent development library

Advances in reinforcement learning depend on building intelligent agents that can autonomously take optimal actions in diverse real-world scenarios.

AI researchers are continually pushing the envelope in intelligent, distributed agents powered by trained reinforcement-learning models. For example, UC Berkeley recently published research on round-robin iterative reinforcement-learning acceleration in a distributed agent environment. It involves training one agent module at a time while others follow simple scripted behaviors, and then the environment “replaces the scripted component of another module with a neural network policy, which continues to train while the previously trained modules remain fixed.”

To accelerate development of reinforcement learning-optimized intelligent AI bots, Google’s DeepMind group recently open-sourced TRFL, a new libraryof building blocks for developing reinforcement-learning agents in TensorFlow. It includes algorithms, loss functions, and other reinforcement-learning operations that the Research Engineering team at DeepMind has used internally for successful reinforcement learning agents such as DQN, Deep Deterministic Policy Gradients (DDPG), and the Importance Weighted Actor Learner Architecture. These building blocks can be used to build new reinforcement learning agents, using a consistent API.

DeepMind is also open-sourcing complete reinforcement-learning agent implementations, including such components as deep-network computational graphs that represent values and policies, as well as learned models of the environment, pseudoreward functions, and replay systems. It is doing this to help the reinforcement-learning community identify and fix bugs in these agents more quickly, while boosting reproducibility of results in the community from reinforcement-learning projects that use these agents. DeepMind will continue to maintain, add new functionality, and receive community contributions to the TRFL library.

End-to-end reinforcement-learning devops pipeline tool

Reinforcement-learning modeling is typically done offline from production applications, with the trained models being served into operational environments only when they’ve been proven out in simulators.

As reinforcement learning becomes fundamental to more AI applications, modeling frameworks need to evolve to handle more training inline to live online applications. As with other AI methodologies, more reinforcement learning initiatives are being integrated into the devops pipelines that drive data preparation, modeling, training, and other pipeline workloads.

With that in mind, Facebook recently open-sourcedits reinforcement learning toolkit Horizon, which is designed to be deployed into AI devops pipelines. The open source Horizon code is available to download via GitHub. Horizon incorporates reinforcement-learning technology that Facebook has been using operationally for scalable, production apps. For example, the social media giant uses reinforcement learning for production AI applications such as predicting which notifications users are most likely to respond to, personalizing suggestions from Facebook’s virtual messaging assistant, and deciding the video quality level to stream to users based on their location or the strength of their cellular signal.

Horizon is an end-to-end pipeline for reinforcement learning-focused AI projects where data sets are large, the feedback loop from target applications is slow, and the business risk of failed reinforcement-learning experiments is high because they involve production applications. It supports reinforcement-learning modeling in high-dimensional discrete and continuous action spaces. It includes implementations of DQN with dueling architecture for discrete action spaces and DDPG for continuous action spaces. It contains automated workflows for training popular deep-reinforcement-learning algorithms in multi-GPU distributed environments, as well as for CPU, GPU, and multi-GPU training on single machines. It includes utilities for data preprocessing, feature normalization, distributed training, and optimized serving.

As befits Facebook’s scale requirements, Horizon is designed to support reinforcement-learning modeling and training on applications with data sets that might have hundreds or even thousands of feature types, each with unique statistical distributions. It uses Spark for data preparation and dimensionality reduction, the PyTorch framework for reinforcement-learning modeling and training, and the Caffe2 AI framework and Open Neural Network Exchange for reinforcement-learning model serving into thousands of production environments.

To mitigate the risk of deploying suboptimal reinforcement learning models into production applications, Horizon incorporates a feature called “counterfactual policy evaluation,” which lets data scientists estimate reinforcement-learning algorithm performance offline before a trained model is deployed. Without this automated feature, developers would need to conduct costly, time-consuming A/B tests to search for optimal reinforcement-learning models and hyperparameters among myriad candidates. In the reinforcement-learning training workflow, Horizon scores trained models using such counterfactual policy evaluation methods as stepwise importance sampling estimator, stepwise direct sampling estimator, stepwise doubly robust estimator, and sequential doubly robust estimator.

To support testing of reinforcement-learning algorithm performance, Facebook has integrated Horizon with the Cartpole and Pendulum environments of the popular benchmarking library OpenAI Gym, and also with a custom Gridworld environment. Horizon includes tools for conducing unit, integration, and performance tests of data preprocessing, feature normalization, and other Horizon reinforcement learning modeling, training, and serving features. It evaluates Discrete-Action DQN, Parametric-Action DQN, and DDPG models with different configurations—such as using Q-learning versus SARSA, or with and without double Q-learning—to ensure reinforcement-learning model robustness and correctness. It performs integration tests on prebuilt Docker images of the target platforms.

The tools are here to start hands-on learning

If you’re an AI developer, many of the algorithms listed here may still be unfamiliar. However, you’re probably beginning to bring reinforcement learning into your development initiatives and to, at the very least, dabble with the open source tools.

In 2019, you can expect the AI industry to incorporate the most widely adopted reinforcement learning frameworks into their workbenches. More of these will be as familiar to mainstream developers as convolutional and recurrent neural networks have become in supervised-learning contexts.

Before long, most AI devops workflows will seamlessly incorporate reinforcement learning alongside supervised and unsupervised learning to power more sophisticated embedded intelligence in production enterprise applications.

Copyright © 2018 IDG Communications, Inc.