Timezone: »
Poster
On Gap-dependent Bounds for Offline Reinforcement Learning
Xinqi Wang · Qiwen Cui · Simon Du
This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning. Prior works showed when the density ratio between an optimal policy and the behavior policy is upper bounded (single policy coverage), then the agent can achieve an $O\left(\frac{1}{\epsilon^2}\right)$ rate, which is also minimax optimal. We show under the same single policy coverage assumption, the rate can be improved to $O\left(\frac{1}{\epsilon}\right)$ when there is a gap in the optimal $Q$-function. Furthermore, we show under a stronger uniform single policy coverage assumption, the sample complexity can be further improved to $O(1)$. Lastly, we also present nearly-matching lower bounds to complement our gap-dependent upper bounds.
Author Information
Xinqi Wang (Interdisciplinary Institute of Information and Science)
Qiwen Cui (Department of Computer Science, University of Washington)
Simon Du (University of Washington)
More from the Same Authors
-
2022 Poster: Provable General Function Class Representation Learning in Multitask Bandits and MDP »
Rui Lu · Andrew Zhao · Simon Du · Gao Huang -
2022 : Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization »
Runlong Zhou · Yuandong Tian · YI WU · Simon Du -
2022 Spotlight: Lightning Talks 4A-4 »
Yunhao Tang · LING LIANG · Thomas Chau · Daeha Kim · Junbiao Cui · Rui Lu · Lei Song · Byung Cheol Song · Andrew Zhao · Remi Munos · Łukasz Dudziak · Jiye Liang · Ke Xue · Kaidi Xu · Mark Rowland · Hongkai Wen · Xing Hu · Xiaobin Huang · Simon Du · Nicholas Lane · Chao Qian · Lei Deng · Bernardo Avila Pires · Gao Huang · Will Dabney · Mohamed Abdelfattah · Yuan Xie · Marc Bellemare -
2022 Spotlight: Provable General Function Class Representation Learning in Multitask Bandits and MDP »
Rui Lu · Andrew Zhao · Simon Du · Gao Huang -
2022 Poster: When are Offline Two-Player Zero-Sum Markov Games Solvable? »
Qiwen Cui · Simon Du -
2022 Poster: Learning in Congestion Games with Bandit Feedback »
Qiwen Cui · Zhihan Xiong · Maryam Fazel · Simon Du -
2022 Poster: Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus »
Qiwen Cui · Simon Du -
2022 Poster: Near-Optimal Randomized Exploration for Tabular Markov Decision Processes »
Zhihan Xiong · Ruoqi Shen · Qiwen Cui · Maryam Fazel · Simon Du -
2021 Workshop: Ecological Theory of Reinforcement Learning: How Does Task Design Influence Agent Learning? »
Manfred Díaz · Hiroki Furuta · Elise van der Pol · Lisa Lee · Shixiang (Shane) Gu · Pablo Samuel Castro · Simon Du · Marc Bellemare · Sergey Levine