Skip to yearly menu bar Skip to main content

Workshop: Deep Reinforcement Learning

An Empirical Study of Non-Uniform Sampling in Off-Policy Reinforcement Learning for Continuous Control

Nicholas Ioannidis · Jonathan Lavington · Mark Schmidt


Off-policy reinforcement learning (RL) algorithms can take advantage of samples generated from all previous interactions with the environment through "experience replay". Such methods outperform almost all on-policy and model-based alternatives in complex tasks where a structured or well parameterized model of the world does not exist. This makes them desirable for practitioners who lack domain specific knowledge, but who still require high sample efficiency. However this high performance can come at a cost. Because of additional hyperparameters introduced to efficiently learn function approximators, off-policy RL can perform poorly on new problems. To address parameter sensitivity, we show how the correct choice of non-uniform sampling for experience replay can stabilize model performance under varying environmental conditions and hyper-parameters.

Chat is not available.