Timezone: »
Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setting of reinforcement learning. We establish a nonasymptotic convergence analysis of actor- critic in this setting. In particular, we prove that actor-critic finds a globally optimal pair of actor (policy) and critic (action-value function) at a linear rate of convergence. Our analysis may serve as a preliminary step towards a complete theoretical understanding of bilevel optimization with nonconvex subproblems, which is NP-hard in the worst case and is often solved using heuristics.
Author Information
Zhuoran Yang (Princeton University)
Yongxin Chen (Georgia Institute of Technology)
Mingyi Hong (University of Minnesota)
Zhaoran Wang (Northwestern University)
More from the Same Authors
-
2020 Poster: Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework »
Wanxin Jin · Zhaoran Wang · Zhuoran Yang · Shaoshuai Mou -
2020 Poster: Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems »
Songtao Lu · Meisam Razaviyayn · Bo Yang · Kejun Huang · Mingyi Hong -
2020 Poster: Understanding Gradient Clipping in Private SGD: A Geometric Perspective »
Xiangyi Chen · Steven Wu · Mingyi Hong -
2020 Poster: Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms »
Xiangyi Chen · Tiancong Chen · Haoran Sun · Steven Wu · Mingyi Hong -
2020 Spotlight: Understanding Gradient Clipping in Private SGD: A Geometric Perspective »
Xiangyi Chen · Steven Wu · Mingyi Hong -
2020 Spotlight: Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems »
Songtao Lu · Meisam Razaviyayn · Bo Yang · Kejun Huang · Mingyi Hong -
2020 Poster: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang -
2020 Oral: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang -
2020 Poster: Provably Efficient Neural GTD for Off-Policy Learning »
Hoi-To Wai · Zhuoran Yang · Zhaoran Wang · Mingyi Hong -
2020 Poster: End-to-End Learning and Intervention in Games »
Jiayang Li · Jing Yu · Yu Nie · Zhaoran Wang -
2020 Poster: Dynamic Regret of Policy Optimization in Non-Stationary Environments »
Yingjie Fei · Zhuoran Yang · Zhaoran Wang · Qiaomin Xie -
2020 Poster: Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations »
Zhuoran Yang · Chi Jin · Zhaoran Wang · Mengdi Wang · Michael Jordan -
2020 Poster: Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss »
Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jieping Ye · Zhaoran Wang -
2020 Poster: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2020 Spotlight: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2019 Poster: Statistical-Computational Tradeoff in Single Index Models »
Lingxiao Wang · Zhuoran Yang · Zhaoran Wang -
2019 Poster: Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy »
Boyi Liu · Qi Cai · Zhuoran Yang · Zhaoran Wang -
2019 Poster: Neural Temporal-Difference Learning Converges to Global Optima »
Qi Cai · Zhuoran Yang · Jason Lee · Zhaoran Wang -
2019 Poster: Variance Reduced Policy Evaluation with Smooth Function Approximation »
Hoi-To Wai · Mingyi Hong · Zhuoran Yang · Zhaoran Wang · Kexin Tang -
2019 Poster: Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games »
Kaiqing Zhang · Zhuoran Yang · Tamer Basar -
2019 Poster: Convergent Policy Optimization for Safe Reinforcement Learning »
Ming Yu · Zhuoran Yang · Mladen Kolar · Zhaoran Wang -
2019 Poster: ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization »
Xiangyi Chen · Sijia Liu · Kaidi Xu · Xingguo Li · Xue Lin · Mingyi Hong · David Cox -
2018 Poster: Contrastive Learning from Pairwise Measurements »
Yi Chen · Zhuoran Yang · Yuchen Xie · Zhaoran Wang -
2018 Poster: Provable Gaussian Embedding with One Observation »
Ming Yu · Zhuoran Yang · Tuo Zhao · Mladen Kolar · Zhaoran Wang -
2018 Poster: Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization »
Hoi-To Wai · Zhuoran Yang · Zhaoran Wang · Mingyi Hong -
2017 Poster: Estimating High-dimensional Non-Gaussian Multiple Index Models via Stein’s Lemma »
Zhuoran Yang · Krishnakumar Balasubramanian · Zhaoran Wang · Han Liu -
2016 Poster: More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning »
Xinyang Yi · Zhaoran Wang · Zhuoran Yang · Constantine Caramanis · Han Liu