Workshop: Offline Reinforcement Learning

Benchmarking Sample Selection Strategies for Batch Reinforcement Learning

Yuwei Fu · Di Wu · Benoit Boulet


Training sample selection techniques, such as prioritized experience replay (PER), have been recognized as of significant importance for online reinforcement learn- ing algorithms. Efficient sample selection can help further improve the learning efficiency and the final performance. However, the impact of sample selection for batch reinforcement learning (RL) has not been well studied. In this work, we investigate the application of non-uniform sampling techniques in batch RL. In particular, we compare six variants of PER based on various heuristic priority metrics that focus on different aspects of the offline learning setting. These metrics include temporal-difference error, n-step return, self-imitation learning objective, pseudo-count, uncertainty, and likelihood. Through extensive experiments on the standard batch RL datasets, we find that non-uniform sampling is also effective in batch RL settings. Further, there is no single metric that works in all situations. The investigation also shows that it is insufficient to avoid the bootstrapping error in batch reinforcement learning by only changing the sampling scheme.

Chat is not available.