Timezone: »
Spotlight
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning
Gen Li · Laixi Shi · Yuxin Chen · Yuantao Gu · Yuejie Chi
@
Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic Markov decision process with $S$ states, $A$ actions and horizon length $H$, substantial progress has been achieved towards characterizing the minimax-optimal regret, which scales on the order of $\sqrt{H^2SAT}$ (modulo log factors) with $T$ the total number of samples. While several competing solution paradigms have been proposed to minimize regret, they are either memory-inefficient, or fall short of optimality unless the sample size exceeds an enormous threshold (e.g., $S^6A^4 \,\mathrm{poly}(H)$ for existing model-free methods).To overcome such a large sample size barrier to efficient RL, we design a novel model-free algorithm, with space complexity $O(SAH)$, that achieves near-optimal regret as soon as the sample size exceeds the order of $SA\,\mathrm{poly}(H)$. In terms of this sample size requirement (also referred to the initial burn-in cost), our method improves --- by at least a factor of $S^5A^3$ --- upon any prior memory-efficient algorithm that is asymptotically regret-optimal. Leveraging the recently introduced variance reduction strategy (also called {\em reference-advantage decomposition}), the proposed algorithm employs an {\em early-settled} reference update rule, with the aid of two Q-learning sequences with upper and lower confidence bounds. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings that involve intricate exploration-exploitation trade-offs.
Author Information
Gen Li (Tsinghua University)
Laixi Shi (Carnegie Mellon University)
I'm actively looking for a research internship in Summer 2022. My research interests include signal processing, nonconvex optimization, high-dimensional statistical estimation, and reinforcement learning, ranging from theory to application.
Yuxin Chen (Princeton University)
Yuantao Gu (Tsinghua University)
Yuejie Chi (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning »
Thu. Dec 9th 12:30 -- 02:00 AM Room
More from the Same Authors
-
2021 : DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization »
Boyue Li · Zhize Li · Yuejie Chi -
2021 : DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization »
Boyue Li · Zhize Li · Yuejie Chi -
2021 : Policy Mirror Descent for Regularized RL: A Generalized Framework with Linear Convergence »
Wenhao Zhan · Shicong Cen · Baihe Huang · Yuxin Chen · Jason Lee · Yuejie Chi -
2021 : Policy Mirror Descent for Regularized RL: A Generalized Framework with Linear Convergence »
Wenhao Zhan · Shicong Cen · Baihe Huang · Yuxin Chen · Jason Lee · Yuejie Chi -
2021 : Latent Goal Allocation for Multi-Agent Goal-Conditioned Self-Supervised Learning »
Laixi Shi · Peide Huang · Rui Chen -
2022 : A Multi-Token Coordinate Descent Method for Vertical Federated Learning »
Pedro Valdeira · Yuejie Chi · Claudia Soares · Joao Xavier -
2023 Poster: Counterfactual Generation with Identifiability Guarantee »
hanqi yan · Lingjing Kong · Lin Gui · Yuejie Chi · Eric Xing · Yulan He · Kun Zhang -
2023 Poster: The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model »
Laixi Shi · Gen Li · Yuting Wei · Yuxin Chen · Matthieu Geist · Yuejie Chi -
2023 Poster: Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation »
Wenhao Ding · Laixi Shi · Yuejie Chi · DING ZHAO -
2023 Poster: Identification of Nonlinear Latent Hierarchical Models »
Lingjing Kong · Biwei Huang · Feng Xie · Eric Xing · Yuejie Chi · Kun Zhang -
2023 Poster: Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning »
Gen Li · Wenhao Zhan · Jason Lee · Yuejie Chi · Yuxin Chen -
2022 Poster: Curriculum Reinforcement Learning using Optimal Transport via Gradual Domain Adaptation »
Peide Huang · Mengdi Xu · Jiacheng Zhu · Laixi Shi · Fei Fang · DING ZHAO -
2022 Poster: BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression »
Haoyu Zhao · Boyue Li · Zhize Li · Peter Richtarik · Yuejie Chi -
2022 Poster: Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model »
Gen Li · Yuejie Chi · Yuting Wei · Yuxin Chen -
2022 Poster: SoteriaFL: A Unified Framework for Private Federated Learning with Communication Compression »
Zhize Li · Haoyu Zhao · Boyue Li · Yuejie Chi -
2021 Poster: Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization »
Shicong Cen · Yuting Wei · Yuejie Chi -
2021 Poster: Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting »
Gen Li · Yuxin Chen · Yuejie Chi · Yuantao Gu · Yuting Wei -
2020 Poster: Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model »
Gen Li · Yuting Wei · Yuejie Chi · Yuantao Gu · Yuxin Chen -
2020 Poster: Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction »
Gen Li · Yuting Wei · Yuejie Chi · Yuantao Gu · Yuxin Chen -
2019 Poster: Nonconvex Low-Rank Symmetric Tensor Completion from Noisy Data »
Changxiao Cai · Gen Li · H. Vincent Poor · Yuxin Chen