Poster
Planning in entropy-regularized Markov decision processes and games
Jean-Bastien Grill · Omar Darwiche Domingues · Pierre Menard · Remi Munos · Michal Valko
East Exhibition Hall B, C #217
Keywords: [ Reinforcement Learning and Planning ] [ Reinforcement Learning and Planning -> Markov Decision Processes; Reinforcement Learning and Planning ] [ Planning; Reinforcemen ]
[
Abstract
]
Abstract:
We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order $\tilde{\mathcal{O}}(1/\epsilon^4)$ for a desired accuracy $\epsilon$, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.
Live content is unavailable. Log in and register to view live content