Poster
Planning in entropy-regularized Markov decision processes and games
Jean-Bastien Grill · Omar Darwiche Domingues · Pierre Menard · Remi Munos · Michal Valko

Thu Dec 12th 10:45 AM -- 12:45 PM @ East Exhibition Hall B + C #217
We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order $\tilde{\mathcal{O}}(1/\epsilon^4)$ for a desired accuracy $\epsilon$, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.