Bayesian Active Reinforcement Learning
Viraj Mehta · Biswajit Paria · Jeff Schneider · Willie Neiswanger

Tue Dec 14 09:00 AM -- 11:00 AM (PST)
Many current reinforcement learning algorithms explore by adding some form of randomness to the optimal policy given current knowledge. Here we take a different strategy, and instead aim to leverage ideas from Bayesian Optimal Experimental Design to guide exploration in RL for increased data-efficiency. In particular, we first construct an acquisition function that characterizes the value that a given data point provides for reinforcement learning. To the best of our knowledge, this is the first study that gives a practical task-aware criterion for evaluating the relative value of acquiring additional data. We also give a practical method for computing this quantity, given a dataset of transitions from a Markov Decision Process (MDP). Using this acquisition function, we develop an algorithm for reinforcement learning with access to a generative model of the environment, a setting which has not seen algorithms for continuous MDPs despite being thoroughly studied in the tabular case. Our algorithm is able to solve a variety of simulated continuous control problems using 5 - 1,000 times less data than model-based reinforcement learning algorithms and $10^3$ - $10^5$ times less data than model-free techniques. We give several ablated comparisons, which point to substantial improvements arising from the ability to operate in a generative setting as well as the principled method of obtaining data.