Timezone: »
Poster
FiniteTime Analysis for Double Qlearning
Huaqing Xiong · Lin Zhao · Yingbin Liang · Wei Zhang
Although Qlearning is one of the most successful algorithms for finding the best actionvalue function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Qfunction values incurred by random sampling. The double Qlearning algorithm proposed in~\citet{hasselt2010double} overcomes such an overestimation issue by randomly switching the update between two Qestimators, and has thus gained significant popularity in practice. However, the theoretical understanding of double Qlearning is rather limited. So far only the asymptotic convergence has been established, which does not characterize how fast the algorithm converges. In this paper, we provide the first nonasymptotic (i.e., finitetime) analysis for double Qlearning. We show that both synchronous and asynchronous double Qlearning are guaranteed to converge to an $\epsilon$accurate neighborhood of the global optimum by taking
$\tilde{\Omega}\left(\left( \frac{1}{(1\gamma)^6\epsilon^2}\right)^{\frac{1}{\omega}} +\left(\frac{1}{1\gamma}\right)^{\frac{1}{1\omega}}\right)$ iterations, where $\omega\in(0,1)$ is the decay parameter of the learning rate, and $\gamma$ is the discount factor. Our analysis develops novel techniques to derive finitetime bounds on the difference between two interconnected stochastic processes, which is new to the literature of stochastic approximation.
Author Information
Huaqing Xiong (Ohio State University)
Lin Zhao (National University of Singapore)
Yingbin Liang (The Ohio State University)
Wei Zhang (Southern University of Science and Technology)
Related Events (a corresponding poster, oral, or spotlight)

2020 Spotlight: FiniteTime Analysis for Double Qlearning »
Tue. Dec 8th 03:20  03:30 PM Room Orals & Spotlights: Reinforcement Learning
More from the Same Authors

2021 Spotlight: Provably Faster Algorithms for Bilevel Optimization »
Junjie Yang · Kaiyi Ji · Yingbin Liang 
2022 Poster: On the Convergence Theory for HessianFree Bilevel Algorithms »
Daouda Sow · Kaiyi Ji · Yingbin Liang 
2022 Poster: A Unifying Framework of OffPolicy General Value Function Evaluation »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin Liang 
2022 Poster: Provable Benefit of Multitask Representation Learning in Reinforcement Learning »
Yuan Cheng · Songtao Feng · Jing Yang · Hong Zhang · Yingbin Liang 
2022 Poster: Provable Generalization of Overparameterized Metalearning Trained with SGD »
Yu Huang · Yingbin Liang · Longbo Huang 
2022 Poster: Will Bilevel Optimizers Benefit from Loops »
Kaiyi Ji · Mingrui Liu · Yingbin Liang · Lei Ying 
2021 Poster: Faster Nonasymptotic Convergence for Double Qlearning »
Lin Zhao · Huaqing Xiong · Yingbin Liang 
2021 Poster: Provably Faster Algorithms for Bilevel Optimization »
Junjie Yang · Kaiyi Ji · Yingbin Liang 
2020 Poster: Convergence of MetaLearning with TaskSpecific Adaptation over Partial Parameters »
Kaiyi Ji · Jason Lee · Yingbin Liang · H. Vincent Poor 
2020 Poster: Improving Sample Complexity Bounds for (Natural) ActorCritic Algorithms »
Tengyu Xu · Zhe Wang · Yingbin Liang 
2019 Poster: SpiderBoost and Momentum: Faster Variance Reduction Algorithms »
Zhe Wang · Kaiyi Ji · Yi Zhou · Yingbin Liang · Vahid Tarokh 
2019 Poster: FiniteSample Analysis for SARSA with Linear Function Approximation »
Shaofeng Zou · Tengyu Xu · Yingbin Liang 
2019 Poster: Two Timescale OffPolicy TD Learning: Nonasymptotic Analysis over Markovian Samples »
Tengyu Xu · Shaofeng Zou · Yingbin Liang 
2018 Poster: Convergence of Cubic Regularization for Nonconvex Optimization under KL Property »
Yi Zhou · Zhe Wang · Yingbin Liang 
2018 Spotlight: Convergence of Cubic Regularization for Nonconvex Optimization under KL Property »
Yi Zhou · Zhe Wang · Yingbin Liang 
2018 Poster: Minimax Estimation of Neural Net Distance »
Kaiyi Ji · Yingbin Liang