Skip to yearly menu bar Skip to main content


On Gap-dependent Bounds for Offline Reinforcement Learning

Xinqi Wang · Qiwen Cui · Simon Du

Hall J (level 1) #839

Keywords: [ gap-dependent ] [ offline reinforcement learning ]

Abstract: This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning. Prior works showed when the density ratio between an optimal policy and the behavior policy is upper bounded (single policy coverage), then the agent can achieve an $O\left(\frac{1}{\epsilon^2}\right)$ rate, which is also minimax optimal. We show under the same single policy coverage assumption, the rate can be improved to $O\left(\frac{1}{\epsilon}\right)$ when there is a gap in the optimal $Q$-function. Furthermore, we show under a stronger uniform single policy coverage assumption, the sample complexity can be further improved to $O(1)$. Lastly, we also present nearly-matching lower bounds to complement our gap-dependent upper bounds.

Chat is not available.