Poster
Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs
Jiafan He · Dongruo Zhou · Quanquan Gu
Keywords: [ Reinforcement Learning and Planning ]
Abstract:
We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting. We propose a model-based algorithm named UCBVI-, which is based on the \emph{optimism in the face of uncertainty principle} and the Bernstein-type bonus. We show that UCBVI- achieves an regret, where is the number of states, is the number of actions, is the discount factor and is the number of steps. In addition, we construct a class of hard MDPs and show that for any algorithm, the expected regret is at least . Our upper bound matches the minimax lower bound up to logarithmic factors, which suggests that UCBVI- is nearly minimax optimal for discounted MDPs.
Chat is not available.