Timezone: »
This paper considers how to complement offline reinforcement learning (RL) data with additional data collection for the task of policy evaluation. In policy evaluation, the task is to estimate the expected return of an evaluation policy on an environment of interest. Prior work on offline policy evaluation typically only considers a static dataset. We consider a setting where we can collect a small amount of additional data to combine with a potentially larger offline RL dataset. We show that simply running the evaluation policy – on-policy data collection – is sub-optimal for this setting. We then introduce two new data collection strategies for policy evaluation, both of which consider previously collected data when collecting future data so as to reduce distribution shift (or sampling error) in the entire dataset collected. Our empirical results show that compared to on-policy sampling, our strategies produce data with lower sampling error and generally lead to lower mean-squared error in policy evaluation for any total dataset size. We also show that these strategies can start from initial off-policy data, collect additional data, and then use both the initial and new data to produce low mean-squared error policy evaluation without using off-policy corrections.
Author Information
Rujie Zhong (University of Edinburgh)
Josiah Hanna (University of Wisconsin -- Madison)
Lukas Schäfer (University of Edinburgh)
Stefano Albrecht (University of Edinburgh)
More from the Same Authors
-
2021 : Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks »
Georgios Papoudakis · Filippos Christianos · Lukas Schäfer · Stefano Albrecht -
2021 : Safe Evaluation For Offline Learning: \\Are We Ready To Deploy? »
Hager Radi · Josiah Hanna · Peter Stone · Matthew Taylor -
2021 : Safe Evaluation For Offline Learning: \\Are We Ready To Deploy? »
Hager Radi · Josiah Hanna · Peter Stone · Matthew Taylor -
2022 : Enhancing Transfer of Reinforcement Learning Agents with Abstract Contextual Embeddings »
Guy Azran · Mohamad Hosein Danesh · Stefano Albrecht · Sarah Keren -
2022 : Verifiable Goal Recognition for Autonomous Driving with Occlusions »
Cillian Brewitt · Massimiliano Tamborski · Stefano Albrecht -
2022 : Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction »
Brahma Pavse · Josiah Hanna -
2022 : Sample Relationships through the Lens of Learning Dynamics with Label Information »
Shangmin Guo · Yi Ren · Stefano Albrecht · Kenny Smith -
2022 : Learning Representations for Reinforcement Learning with Hierarchical Forward Models »
Trevor McInroe · Lukas Schäfer · Stefano Albrecht -
2022 : Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning »
Mhairi Dunion · Trevor McInroe · Kevin Sebastian Luck · Josiah Hanna · Stefano Albrecht -
2022 Poster: Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning »
Rujie Zhong · Duohan Zhang · Lukas Schäfer · Stefano Albrecht · Josiah Hanna -
2021 Poster: Agent Modelling under Partial Observability for Deep Reinforcement Learning »
Georgios Papoudakis · Filippos Christianos · Stefano Albrecht -
2020 Poster: Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning »
Filippos Christianos · Lukas Schäfer · Stefano Albrecht