Timezone: »
Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction
Brahma Pavse · Josiah Hanna
Event URL: https://openreview.net/forum?id=Tf3lM56Z0j »
We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of an evaluation policy, $\pi_e$, using a fixed dataset, $\mathcal{D}$, collected by one or more policies that may be different from $\pi_e$. Current OPE algorithms may produce poor OPE estimates under policy distribution shift i.e., when the probability of a particular state-action pair occurring under $\pi_e$ is very different from the probability of that same pair occurring in $\mathcal{D}$ (Voloshin et al. 2021, Fu et al. 2021). In this work, we propose to improve the accuracy of OPE estimation by projecting the ground state-space into a lower-dimensional state-space using concepts from the state abstraction literature in RL. Specifically, we consider marginalized importance sampling (MIS) OPE algorithms which compute distribution correction ratios to produce their OPE estimate. In the original state-space, these ratios may have high variance which may lead to high variance OPE. However, we prove that in the lower-dimensional abstract state-space the ratios can have lower variance resulting in lower variance OPE. We then present a minimax optimization problem that incorporates the state abstraction. Finally, our empirical evaluation on difficult, high-dimensional state-space OPE tasks shows that the abstract ratios can make MIS OPE estimators achieve lower mean-squared error and more robust to hyperparameter tuning than the ground ratios.
We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of an evaluation policy, $\pi_e$, using a fixed dataset, $\mathcal{D}$, collected by one or more policies that may be different from $\pi_e$. Current OPE algorithms may produce poor OPE estimates under policy distribution shift i.e., when the probability of a particular state-action pair occurring under $\pi_e$ is very different from the probability of that same pair occurring in $\mathcal{D}$ (Voloshin et al. 2021, Fu et al. 2021). In this work, we propose to improve the accuracy of OPE estimation by projecting the ground state-space into a lower-dimensional state-space using concepts from the state abstraction literature in RL. Specifically, we consider marginalized importance sampling (MIS) OPE algorithms which compute distribution correction ratios to produce their OPE estimate. In the original state-space, these ratios may have high variance which may lead to high variance OPE. However, we prove that in the lower-dimensional abstract state-space the ratios can have lower variance resulting in lower variance OPE. We then present a minimax optimization problem that incorporates the state abstraction. Finally, our empirical evaluation on difficult, high-dimensional state-space OPE tasks shows that the abstract ratios can make MIS OPE estimators achieve lower mean-squared error and more robust to hyperparameter tuning than the ground ratios.
Author Information
Brahma Pavse (University of Wisconsin - Madison)
Josiah Hanna (University of Wisconsin -- Madison)
More from the Same Authors
-
2021 : Safe Evaluation For Offline Learning: \\Are We Ready To Deploy? »
Hager Radi · Josiah Hanna · Peter Stone · Matthew Taylor -
2021 : Safe Evaluation For Offline Learning: \\Are We Ready To Deploy? »
Hager Radi · Josiah Hanna · Peter Stone · Matthew Taylor -
2021 : Robust On-Policy Data Collection for Data-Efficient Policy Evaluation »
Rujie Zhong · Josiah Hanna · Lukas Schäfer · Stefano Albrecht -
2022 : Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning »
Mhairi Dunion · Trevor McInroe · Kevin Sebastian Luck · Josiah Hanna · Stefano Albrecht -
2023 Poster: State-Action Similarity-Based Representations for Off-Policy Evaluation »
Brahma Pavse · Josiah Hanna -
2023 Poster: Conditional Mutual Information for Disentangled Representations in Reinforcement Learning »
Mhairi Dunion · Trevor McInroe · Kevin Sebastian Luck · Josiah Hanna · Stefano Albrecht -
2023 Poster: Multi-task Representation Learning for Pure Exploration in Bilinear Bandits »
Subhojyoti Mukherjee · Qiaomin Xie · Josiah Hanna · Robert Nowak -
2022 Poster: Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning »
Rujie Zhong · Duohan Zhang · Lukas Schäfer · Stefano Albrecht · Josiah Hanna