Timezone: »
We consider offline Reinforcement Learning (RL), where the agent does not interact with the environment and must rely on offline data collected using a behavior policy. Previous works provide policy evaluation guarantees when the target policy to be evaluated is covered by the behavior policy, that is, state-action pairs visited by the target policy must also be visited by the behavior policy. We show that when the MDP has a latent low-rank structure, this coverage condition can be relaxed. Building on the connection to weighted matrix completion with non-uniform observations, we propose an offline policy evaluation algorithm that leverages the low-rank structure to estimate the values of uncovered state-action pairs. Our algorithm does not require a known feature representation, and our finite-sample error bound involves a novel discrepancy measure quantifying the discrepancy between the behavior and target policies in the spectral space. We provide concrete examples where our algorithm achieves accurate estimation while existing coverage conditions are not satisfied.
Author Information
Xumei Xi (Cornell University)
Christina Yu (Cornell University)
Yudong Chen (University of Wisconsin - Madison)
More from the Same Authors
-
2022 : A Causal Inference Framework for Network Interference with Panel Data »
Sarah Cen · Anish Agarwal · Christina Yu · Devavrat Shah -
2022 : Exploiting Neighborhood Interference with Low Order Interactions under Unit Randomized Design »
Mayleen Cortez · Matthew Eichhorn · Christina Yu -
2022 : Exploiting Neighborhood Interference with Low Order Interactions under Unit Randomized Design »
Mayleen Cortez · Matthew Eichhorn · Christina Yu -
2022 Poster: Staggered Rollout Designs Enable Causal Inference Under Interference Without Network Knowledge »
Mayleen Cortez · Matthew Eichhorn · Christina Yu -
2021 Poster: Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang -
2021 Poster: Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery »
Lijun Ding · Liwei Jiang · Yudong Chen · Qing Qu · Zhihui Zhu -
2020 Poster: Adaptive Discretization for Model-Based Reinforcement Learning »
Sean Sinclair · Tianyu Wang · Gauri Jain · Siddhartha Banerjee · Christina Yu -
2020 Poster: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2020 Spotlight: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: Factor Group-Sparse Regularization for Efficient Low-Rank Matrix Recovery »
Jicong Fan · Lijun Ding · Yudong Chen · Madeleine Udell -
2019 Poster: Nonparametric Contextual Bandits in Metric Spaces with Unknown Metric »
Nirandika Wanigasekara · Christina Yu -
2019 Poster: Global Convergence of Least Squares EM for Demixing Two Log-Concave Densities »
Wei Qian · Yuqian Zhang · Yudong Chen -
2017 : Iterative Collaborative Filtering for Sparse Matrix Estimation »
Christina Lee -
2017 Workshop: Nearest Neighbors for Modern Applications with Massive Data: An Age-old Solution with New Challenges »
George H Chen · Devavrat Shah · Christina Lee -
2017 Poster: Thy Friend is My Friend: Iterative Collaborative Filtering for Sparse Matrix Estimation »
Christian Borgs · Jennifer Chayes · Christina Lee · Devavrat Shah -
2016 Poster: Fast Algorithms for Robust PCA via Gradient Descent »
Xinyang Yi · Dohyung Park · Yudong Chen · Constantine Caramanis -
2016 Poster: Blind Regression: Nonparametric Regression for Latent Variable Models via Collaborative Filtering »
Dogyoon Song · Christina Lee · Yihua Li · Devavrat Shah -
2014 Poster: Clustering from Labels and Time-Varying Graphs »
Shiau Hong Lim · Yudong Chen · Huan Xu -
2014 Spotlight: Clustering from Labels and Time-Varying Graphs »
Shiau Hong Lim · Yudong Chen · Huan Xu -
2013 Poster: Computing the Stationary Distribution Locally »
Christina Lee · Asuman Ozdaglar · Devavrat Shah -
2012 Poster: Clustering Sparse Graphs »
Yudong Chen · Sujay Sanghavi · Huan Xu