Skip to yearly menu bar Skip to main content


Poster

Zap Q-Learning

Adithya M Devraj · Sean Meyn

Pacific Ballroom #19

Keywords: [ Control Theory ] [ Markov Decision Processes ] [ Reinforcement Learning ] [ Decision and Control ] [ Stochastic Methods ] [ Online Learning ] [ Learning Theory ]


Abstract:

The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-scale update equation for the matrix gain sequence. The analysis suggests that the approach will lead to stable and efficient computation even for non-ideal parameterized settings. Numerical experiments confirm the quick convergence, even in such non-ideal cases.

Live content is unavailable. Log in and register to view live content