Timezone: »
Poster
Faster Non-asymptotic Convergence for Double Q-learning
Lin Zhao · Huaqing Xiong · Yingbin Liang
Double Q-learning (Hasselt, 2010) has gained significant success in practice due to its effectiveness in overcoming the overestimation issue of Q-learning. However, the theoretical understanding of double Q-learning is rather limited. The only existing finite-time analysis was recently established in (Xiong et al. 2020), where the polynomial learning rate adopted in the analysis typically yields a slower convergence rate. This paper tackles the more challenging case of a constant learning rate, and develops new analytical tools that improve the existing convergence rate by orders of magnitude. Specifically, we show that synchronous double Q-learning attains an $\epsilon$-accurate global optimum with a time complexity of $\tilde{\Omega}\left(\frac{\ln D}{(1-\gamma)^7\epsilon^2} \right)$, and the asynchronous algorithm achieves a time complexity of $\tilde{\Omega}\left(\frac{L}{(1-\gamma)^7\epsilon^2} \right)$, where $D$ is the cardinality of the state-action space, $\gamma$ is the discount factor, and $L$ is a parameter related to the sampling strategy for asynchronous double Q-learning. These results improve the existing convergence rate by the order of magnitude in terms of its dependence on all major parameters $(\epsilon,1-\gamma, D, L)$. This paper presents a substantial step toward the full understanding of the fast convergence of double-Q learning.
Author Information
Lin Zhao (National University of Singapore)
Huaqing Xiong (Ohio State University)
Yingbin Liang (The Ohio State University)
More from the Same Authors
-
2021 Spotlight: Provably Faster Algorithms for Bilevel Optimization »
Junjie Yang · Kaiyi Ji · Yingbin Liang -
2022 Poster: Provable Generalization of Overparameterized Meta-learning Trained with SGD »
Yu Huang · Yingbin Liang · Longbo Huang -
2022 : Online Min-max Optimization: Nonconvexity, Nonstationarity, and Dynamic Regret »
Yu Huang · Yuan Cheng · Yingbin Liang · Longbo Huang -
2023 Poster: Provably Efficient Algorithm for Nonstationary Low-Rank MDPs »
Yuan Cheng · Jing Yang · Yingbin Liang -
2023 Poster: Finite-Time Analysis of Single-Timescale Actor-Critic »
Xuyang Chen · Lin Zhao -
2023 Poster: Non-Convex Bilevel Optimization with Time-Varying Objective Functions »
Sen Lin · Daouda Sow · Kaiyi Ji · Yingbin Liang · Ness Shroff -
2022 Spotlight: Will Bilevel Optimizers Benefit from Loops »
Kaiyi Ji · Mingrui Liu · Yingbin Liang · Lei Ying -
2022 Spotlight: Lightning Talks 3B-2 »
Yu Huang · Tero Karras · Maxim Kodryan · Shiau Hong Lim · Shudong Huang · Ziyu Wang · Siqiao Xue · ILYAS MALIK · Ekaterina Lobacheva · Miika Aittala · Hongjie Wu · Yuhao Zhou · Yingbin Liang · Xiaoming Shi · Jun Zhu · Maksim Nakhodnov · Timo Aila · Yazhou Ren · James Zhang · Longbo Huang · Dmitry Vetrov · Ivor Tsang · Hongyuan Mei · Samuli Laine · Zenglin Xu · Wentao Feng · Jiancheng Lv -
2022 Spotlight: Provable Generalization of Overparameterized Meta-learning Trained with SGD »
Yu Huang · Yingbin Liang · Longbo Huang -
2022 Spotlight: Lightning Talks 1A-3 »
Kimia Noorbakhsh · Ronan Perry · Qi Lyu · Jiawei Jiang · Christian Toth · Olivier Jeunen · Xin Liu · Yuan Cheng · Lei Li · Manuel Rodriguez · Julius von Kügelgen · Lars Lorch · Nicolas Donati · Lukas Burkhalter · Xiao Fu · Zhongdao Wang · Songtao Feng · Ciarán Gilligan-Lee · Rishabh Mehrotra · Fangcheng Fu · Jing Yang · Bernhard Schölkopf · Ya-Li Li · Christian Knoll · Maks Ovsjanikov · Andreas Krause · Shengjin Wang · Hong Zhang · Mounia Lalmas · Bolin Ding · Bo Du · Yingbin Liang · Franz Pernkopf · Robert Peharz · Anwar Hithnawi · Julius von Kügelgen · Bo Li · Ce Zhang -
2022 Spotlight: Provable Benefit of Multitask Representation Learning in Reinforcement Learning »
Yuan Cheng · Songtao Feng · Jing Yang · Hong Zhang · Yingbin Liang -
2022 Poster: A Unifying Framework of Off-Policy General Value Function Evaluation »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin Liang -
2022 Poster: On the Convergence Theory for Hessian-Free Bilevel Algorithms »
Daouda Sow · Kaiyi Ji · Yingbin Liang -
2022 Poster: Provable Benefit of Multitask Representation Learning in Reinforcement Learning »
Yuan Cheng · Songtao Feng · Jing Yang · Hong Zhang · Yingbin Liang -
2022 Poster: Will Bilevel Optimizers Benefit from Loops »
Kaiyi Ji · Mingrui Liu · Yingbin Liang · Lei Ying -
2021 Poster: Provably Faster Algorithms for Bilevel Optimization »
Junjie Yang · Kaiyi Ji · Yingbin Liang -
2020 Poster: Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters »
Kaiyi Ji · Jason Lee · Yingbin Liang · H. Vincent Poor -
2020 Poster: Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms »
Tengyu Xu · Zhe Wang · Yingbin Liang -
2020 Poster: Finite-Time Analysis for Double Q-learning »
Huaqing Xiong · Lin Zhao · Yingbin Liang · Wei Zhang -
2020 Spotlight: Finite-Time Analysis for Double Q-learning »
Huaqing Xiong · Lin Zhao · Yingbin Liang · Wei Zhang -
2019 Poster: SpiderBoost and Momentum: Faster Variance Reduction Algorithms »
Zhe Wang · Kaiyi Ji · Yi Zhou · Yingbin Liang · Vahid Tarokh -
2019 Poster: Finite-Sample Analysis for SARSA with Linear Function Approximation »
Shaofeng Zou · Tengyu Xu · Yingbin Liang -
2019 Poster: Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples »
Tengyu Xu · Shaofeng Zou · Yingbin Liang -
2018 Poster: Convergence of Cubic Regularization for Nonconvex Optimization under KL Property »
Yi Zhou · Zhe Wang · Yingbin Liang -
2018 Spotlight: Convergence of Cubic Regularization for Nonconvex Optimization under KL Property »
Yi Zhou · Zhe Wang · Yingbin Liang -
2018 Poster: Minimax Estimation of Neural Net Distance »
Kaiyi Ji · Yingbin Liang