NeurIPS Poster Planning in entropy-regularized Markov decision processes and games

Poster

Planning in entropy-regularized Markov decision processes and games

Jean-Bastien Grill · Omar Darwiche Domingues · Pierre Menard · Remi Munos · Michal Valko

East Exhibition Hall B, C #217

Keywords: [ Reinforcement Learning and Planning ] [ Reinforcement Learning and Planning -> Markov Decision Processes; Reinforcement Learning and Planning ] [ Planning; Reinforcemen ]

[ Abstract ]

Abstract: We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order

~ O (1 / ϵ^{4})

$\tilde{\mathcal{O}}(1/\epsilon^4)$ for a desired accuracy

$\epsilon$ , whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

Live content is unavailable. Log in and register to view live content