Timezone: »
Poster
Near-Optimal Offline Reinforcement Learning via Double Variance Reduction
Ming Yin · Yu Bai · Yu-Xiang Wang
We consider the problem of offline reinforcement learning (RL) --- a well-motivated setting of RL that aims at policy optimization using only historical data. Despite its wide applicability, theoretical understandings of offline RL, such as its optimal sample complexity, remain largely open even in basic settings such as \emph{tabular} Markov Decision Processes (MDPs). In this paper, we propose \emph{Off-Policy Double Variance Reduction} (OPDVR), a new variance reduction-based algorithm for offline RL. Our main result shows that OPDVR provably identifies an $\epsilon$-optimal policy with $\widetilde{O}(H^2/d_m\epsilon^2)$ episodes of offline data in the finite-horizon \emph{stationary transition} setting, where $H$ is the horizon length and $d_m$ is the minimal marginal state-action distribution induced by the behavior policy. This improves over the best-known upper bound by a factor of $H$. Moreover, we establish an information-theoretic lower bound of $\Omega(H^2/d_m\epsilon^2)$ which certifies that OPDVR is optimal up to logarithmic factors. Lastly, we show that OPDVR also achieves rate-optimal sample complexity under alternative settings such as the finite-horizon MDPs with non-stationary transitions and the infinite horizon MDPs with discounted rewards.
Author Information
Ming Yin (UC Santa Barbara)
Yu Bai (Salesforce Research)
Yu-Xiang Wang (UC Santa Barbara)
More from the Same Authors
-
2021 Spotlight: Logarithmic Regret in Feature-based Dynamic Pricing »
Jianyu Xu · Yu-Xiang Wang -
2021 Spotlight: Understanding the Under-Coverage Bias in Uncertainty Estimation »
Yu Bai · Song Mei · Huan Wang · Caiming Xiong -
2021 : Instance-dependent Offline Reinforcement Learning: From tabular RL to linear MDPs »
Ming Yin · Yu-Xiang Wang -
2022 : Offline Reinforcement Learning with Closed-Form Policy Improvement Operators »
Jiachen Li · Edwin Zhang · Ming Yin · Qinxun Bai · Yu-Xiang Wang · William Yang Wang -
2023 Poster: What can a Single Attention Layer Learn? A Study Through the Random Features Lens »
Hengyu Fu · Tianyu Guo · Yu Bai · Song Mei -
2023 Poster: Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection »
Yu Bai · Fan Chen · Huan Wang · Caiming Xiong · Song Mei -
2023 Poster: Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations »
Minshuo Chen · Yu Bai · H. Vincent Poor · Mengdi Wang -
2023 Oral: Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection »
Yu Bai · Fan Chen · Huan Wang · Caiming Xiong · Song Mei -
2022 Poster: Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials »
Eshaan Nichani · Yu Bai · Jason Lee -
2022 Poster: Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent »
Yu Bai · Chi Jin · Song Mei · Ziang Song · Tiancheng Yu -
2022 Poster: Policy Optimization for Markov Games: Unified Framework and Faster Convergence »
Runyu Zhang · Qinghua Liu · Huan Wang · Caiming Xiong · Na Li · Yu Bai -
2022 Poster: Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games »
Ziang Song · Song Mei · Yu Bai -
2021 Poster: Privately Publishable Per-instance Privacy »
Rachel Redberg · Yu-Xiang Wang -
2021 Poster: Logarithmic Regret in Feature-based Dynamic Pricing »
Jianyu Xu · Yu-Xiang Wang -
2021 Poster: Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings »
Ming Yin · Yu-Xiang Wang -
2021 Poster: Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games »
Yu Bai · Chi Jin · Huan Wang · Caiming Xiong -
2021 Poster: Understanding the Under-Coverage Bias in Uncertainty Estimation »
Yu Bai · Song Mei · Huan Wang · Caiming Xiong -
2021 Poster: Towards Instance-Optimal Offline Reinforcement Learning with Pessimism »
Ming Yin · Yu-Xiang Wang -
2021 Poster: Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning »
Tengyang Xie · Nan Jiang · Huan Wang · Caiming Xiong · Yu Bai -
2019 Poster: Provably Efficient Q-Learning with Low Switching Cost »
Yu Bai · Tengyang Xie · Nan Jiang · Yu-Xiang Wang -
2017 Poster: Higher-Order Total Variation Classes on Grids: Minimax Theory and Trend Filtering Methods »
Veeranjaneyulu Sadhanala · Yu-Xiang Wang · James Sharpnack · Ryan Tibshirani -
2016 : Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang -
2016 Poster: Total Variation Classes Beyond 1d: Minimax Rates, and the Limitations of Linear Smoothers »
Veeranjaneyulu Sadhanala · Yu-Xiang Wang · Ryan Tibshirani -
2015 : Yu-Xiang Wang: Learning with differential privacy: stability, learnability and the sufficiency and necessity of ERM principle »
Yu-Xiang Wang -
2015 Poster: Differentially private subspace clustering »
Yining Wang · Yu-Xiang Wang · Aarti Singh