Timezone: »

Benchmarking Sample Selection Strategies for Batch Reinforcement Learning
Yuwei Fu · Di Wu · Benoit Boulet

Training sample selection techniques, such as prioritized experience replay (PER), have been recognized as of significant importance for online reinforcement learn- ing algorithms. Efficient sample selection can help further improve the learning efficiency and the final performance. However, the impact of sample selection for batch reinforcement learning (RL) has not been well studied. In this work, we investigate the application of non-uniform sampling techniques in batch RL. In particular, we compare six variants of PER based on various heuristic priority metrics that focus on different aspects of the offline learning setting. These metrics include temporal-difference error, n-step return, self-imitation learning objective, pseudo-count, uncertainty, and likelihood. Through extensive experiments on the standard batch RL datasets, we find that non-uniform sampling is also effective in batch RL settings. Further, there is no single metric that works in all situations. The investigation also shows that it is insufficient to avoid the bootstrapping error in batch reinforcement learning by only changing the sampling scheme.

Author Information

Yuwei Fu (McGill University)
Di Wu (McGill)
Benoit Boulet (McGill)

More from the Same Authors

  • 2022 Poster: A Closer Look at Offline RL Agents »
    Yuwei Fu · Di Wu · Benoit Boulet
  • 2022 Spotlight: Lightning Talks 3B-3 »
    Sitao Luan · Zhiyuan You · Ruofan Liu · Linhao Qu · Yuwei Fu · Jiaxi Wang · Chunyu Wei · Jian Liang · xiaoyuan luo · Di Wu · Yun Lin · Lei Cui · Ji Wu · Chenqing Hua · Yujun Shen · Qincheng Lu · XIANGLIN YANG · Benoit Boulet · Manning Wang · Di Liu · Lei Huang · Fei Wang · Kai Yang · Jiaqi Zhu · Jin Song Dong · Zhijian Song · Xin Lu · Mingde Zhao · Shuyuan Zhang · Yu Zheng · Xiao-Wen Chang · Xinyi Le · Doina Precup
  • 2022 Spotlight: A Closer Look at Offline RL Agents »
    Yuwei Fu · Di Wu · Benoit Boulet