Timezone: »

Bayesian Active Reinforcement Learning
Viraj Mehta · Biswajit Paria · Jeff Schneider · Willie Neiswanger
Many current reinforcement learning algorithms explore by adding some form of randomness to the optimal policy given current knowledge. Here we take a different strategy, and instead aim to leverage ideas from Bayesian Optimal Experimental Design to guide exploration in RL for increased data-efficiency. In particular, we first construct an acquisition function that characterizes the value that a given data point provides for reinforcement learning. To the best of our knowledge, this is the first study that gives a practical task-aware criterion for evaluating the relative value of acquiring additional data. We also give a practical method for computing this quantity, given a dataset of transitions from a Markov Decision Process (MDP). Using this acquisition function, we develop an algorithm for reinforcement learning with access to a generative model of the environment, a setting which has not seen algorithms for continuous MDPs despite being thoroughly studied in the tabular case. Our algorithm is able to solve a variety of simulated continuous control problems using 5 - 1,000 times less data than model-based reinforcement learning algorithms and $10^3$ - $10^5$ times less data than model-free techniques. We give several ablated comparisons, which point to substantial improvements arising from the ability to operate in a generative setting as well as the principled method of obtaining data.

Author Information

Viraj Mehta (Carnegie Mellon University)
Biswajit Paria (Carnegie Mellon University)
Jeff Schneider (CMU)
Willie Neiswanger (Carnegie Mellon University)

More from the Same Authors