Timezone: »
Poster
Zap Q-Learning
Adithya M Devraj · Sean Meyn
The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-scale update equation for the matrix gain sequence. The analysis suggests that the approach will lead to stable and efficient computation even for non-ideal parameterized settings. Numerical experiments confirm the quick convergence, even in such non-ideal cases.
Author Information
Adithya M Devraj (University of Florida)
Sean Meyn (University of Florida)
More from the Same Authors
-
2022 Spotlight: Approaching Quartic Convergence Rates for Quasi-Stochastic Approximation with Application to Gradient-Free Optimization »
Caio Kalil Lauand · Sean Meyn -
2022 Spotlight: Lightning Talks 2A-1 »
Caio Kalil Lauand · Ryan Strauss · Yasong Feng · lingyu gu · Alireza Fathollah Pour · Oren Mangoubi · Jianhao Ma · Binghui Li · Hassan Ashtiani · Yongqi Du · Salar Fattahi · Sean Meyn · Jikai Jin · Nisheeth Vishnoi · zengfeng Huang · Junier B Oliva · yuan zhang · Han Zhong · Tianyu Wang · John Hopcroft · Di Xie · Shiliang Pu · Liwei Wang · Robert Qiu · Zhenyu Liao -
2022 Poster: Approaching Quartic Convergence Rates for Quasi-Stochastic Approximation with Application to Gradient-Free Optimization »
Caio Kalil Lauand · Sean Meyn -
2020 Poster: Zap Q-Learning With Nonlinear Function Approximation »
Shuhang Chen · Adithya M Devraj · Fan Lu · Ana Busic · Sean Meyn -
2019 Poster: Stochastic Variance Reduced Primal Dual Algorithms for Empirical Composition Optimization »
Adithya M Devraj · Jianshu Chen