Timezone: »

On Gap-dependent Bounds for Offline Reinforcement Learning
Xinqi Wang · Qiwen Cui · Simon Du

Tue Nov 29 02:00 PM -- 04:00 PM (PST) @ Hall J #839
This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning. Prior works showed when the density ratio between an optimal policy and the behavior policy is upper bounded (single policy coverage), then the agent can achieve an $O\left(\frac{1}{\epsilon^2}\right)$ rate, which is also minimax optimal. We show under the same single policy coverage assumption, the rate can be improved to $O\left(\frac{1}{\epsilon}\right)$ when there is a gap in the optimal $Q$-function. Furthermore, we show under a stronger uniform single policy coverage assumption, the sample complexity can be further improved to $O(1)$. Lastly, we also present nearly-matching lower bounds to complement our gap-dependent upper bounds.

Author Information

Xinqi Wang (Interdisciplinary Institute of Information and Science)
Qiwen Cui (Department of Computer Science, University of Washington)
Simon Du (University of Washington)

More from the Same Authors