Timezone: »
Zap Q-learning is a recent class of reinforcement learning algorithms, motivated primarily as a means to accelerate convergence. Stability theory has been absent outside of two restrictive classes: the tabular setting, and optimal stopping. This paper introduces a new framework for analysis of a more general class of recursive algorithms known as stochastic approximation. Based on this general theory, it is shown that Zap Q-learning is consistent under a non-degeneracy assumption, even when the function approximation architecture is nonlinear. Zap Q-learning with neural network function approximation emerges as a special case, and is tested on examples from OpenAI Gym. Based on multiple experiments with a range of neural network sizes, it is found that the new algorithms converge quickly and are robust to choice of function approximation architecture.
Author Information
Shuhang Chen (University of Florida)
Adithya M Devraj (Stanford University)
Fan Lu (University of Florida)
Ana Busic (INRIA)
Sean Meyn (University of Florida)
More from the Same Authors
-
2022 Spotlight: Approaching Quartic Convergence Rates for Quasi-Stochastic Approximation with Application to Gradient-Free Optimization »
Caio Kalil Lauand · Sean Meyn -
2022 Spotlight: Lightning Talks 2A-1 »
Caio Kalil Lauand · Ryan Strauss · Yasong Feng · lingyu gu · Alireza Fathollah Pour · Oren Mangoubi · Jianhao Ma · Binghui Li · Hassan Ashtiani · Yongqi Du · Salar Fattahi · Sean Meyn · Jikai Jin · Nisheeth Vishnoi · zengfeng Huang · Junier B Oliva · yuan zhang · Han Zhong · Tianyu Wang · John Hopcroft · Di Xie · Shiliang Pu · Liwei Wang · Robert Qiu · Zhenyu Liao -
2022 Poster: Approaching Quartic Convergence Rates for Quasi-Stochastic Approximation with Application to Gradient-Free Optimization »
Caio Kalil Lauand · Sean Meyn -
2019 Poster: Stochastic Variance Reduced Primal Dual Algorithms for Empirical Composition Optimization »
Adithya M Devraj · Jianshu Chen -
2017 Poster: Zap Q-Learning »
Adithya M Devraj · Sean Meyn