Timezone: »
Spotlight
FiniteTime Analysis for Double Qlearning
Huaqing Xiong · Lin Zhao · Yingbin Liang · Wei Zhang
Tue Dec 08 07:20 AM  07:30 AM (PST) @ Orals & Spotlights: Reinforcement Learning
Although Qlearning is one of the most successful algorithms for finding the best actionvalue function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Qfunction values incurred by random sampling. The double Qlearning algorithm proposed in~\citet{hasselt2010double} overcomes such an overestimation issue by randomly switching the update between two Qestimators, and has thus gained significant popularity in practice. However, the theoretical understanding of double Qlearning is rather limited. So far only the asymptotic convergence has been established, which does not characterize how fast the algorithm converges. In this paper, we provide the first nonasymptotic (i.e., finitetime) analysis for double Qlearning. We show that both synchronous and asynchronous double Qlearning are guaranteed to converge to an $\epsilon$accurate neighborhood of the global optimum by taking
$\tilde{\Omega}\left(\left( \frac{1}{(1\gamma)^6\epsilon^2}\right)^{\frac{1}{\omega}} +\left(\frac{1}{1\gamma}\right)^{\frac{1}{1\omega}}\right)$ iterations, where $\omega\in(0,1)$ is the decay parameter of the learning rate, and $\gamma$ is the discount factor. Our analysis develops novel techniques to derive finitetime bounds on the difference between two interconnected stochastic processes, which is new to the literature of stochastic approximation.
Author Information
Huaqing Xiong (Ohio State University)
Lin Zhao (National University of Singapore)
Yingbin Liang (The Ohio State University)
Wei Zhang (Southern University of Science and Technology)
Related Events (a corresponding poster, oral, or spotlight)

2020 Poster: FiniteTime Analysis for Double Qlearning »
Tue Dec 8th 05:00  07:00 PM Room Poster Session 1
More from the Same Authors

2020 Poster: Convergence of MetaLearning with TaskSpecific Adaptation over Partial Parameters »
Kaiyi Ji · Jason Lee · Yingbin Liang · H. Vincent Poor 
2020 Poster: Improving Sample Complexity Bounds for (Natural) ActorCritic Algorithms »
Tengyu Xu · Zhe Wang · Yingbin Liang 
2019 Poster: SpiderBoost and Momentum: Faster Variance Reduction Algorithms »
Zhe Wang · Kaiyi Ji · Yi Zhou · Yingbin Liang · Vahid Tarokh 
2019 Poster: FiniteSample Analysis for SARSA with Linear Function Approximation »
Shaofeng Zou · Tengyu Xu · Yingbin Liang 
2019 Poster: Two Timescale OffPolicy TD Learning: Nonasymptotic Analysis over Markovian Samples »
Tengyu Xu · Shaofeng Zou · Yingbin Liang 
2018 Poster: Convergence of Cubic Regularization for Nonconvex Optimization under KL Property »
Yi Zhou · Zhe Wang · Yingbin Liang 
2018 Spotlight: Convergence of Cubic Regularization for Nonconvex Optimization under KL Property »
Yi Zhou · Zhe Wang · Yingbin Liang 
2018 Poster: Minimax Estimation of Neural Net Distance »
Kaiyi Ji · Yingbin Liang