Timezone: »
Poster
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
Jose Blanchet · Miao Lu · Tong Zhang · Han Zhong
We study distributionally robust offline reinforcement learning (RL), which seeks to find an optimal robust policy purely from an offline dataset that can perform well in perturbed environments. We propose a generic algorithm framework Doubly Pessimistic Model-based Policy Optimization ($\texttt{P}^2\texttt{MPO}$) for robust offline RL, which features a novel combination of a flexible model estimation subroutine and a doubly pessimistic policy optimization step. Here the double pessimism principle is crucial to overcome the distribution shift incurred by i) the mismatch between behavior policy and the family of target policies; and ii) the perturbation of the nominal model. Under certain accuracy assumptions on the model estimation subroutine, we show that $\texttt{P}^2\texttt{MPO}$ is provably sample-efficient with robust partial coverage data, which means that the offline dataset has good coverage of the distributions induced by the optimal robust policy and perturbed models around the nominal model. By tailoring specific model estimation subroutines for concrete examples including tabular Robust Markov Decision Process (RMDP), factored RMDP, and RMDP with kernel and neural function approximations, we show that $\texttt{P}^2\texttt{MPO}$ enjoys a $\tilde{\mathcal{O}}(n^{-1/2})$ convergence rate, where $n$ is the number of trajectories in the offline dataset. Notably, these models, except for the tabular case, are first identified and proven tractable by this paper. To the best of our knowledge, we first propose a general learning principle --- double pessimism --- for robust offline RL and show that it is provably efficient in the context of general function approximations.
Author Information
Jose Blanchet (Stanford University)
Miao Lu (Stanford University)
Tong Zhang (The Hong Kong University of Science and Technology)
Han Zhong (Peking University)
More from the Same Authors
-
2022 : A Neural Tangent Kernel Perspective on Function-Space Regularization in Neural Networks »
Zonghao Chen · Xupeng Shi · Tim G. J. Rudner · Qixuan Feng · Weizhong Zhang · Tong Zhang -
2022 : Minimax Optimal Kernel Operator Learning via Multilevel Training »
Jikai Jin · Yiping Lu · Jose Blanchet · Lexing Ying -
2022 : Particle-based Variational Inference with Preconditioned Functional Gradient Flow »
Hanze Dong · Xi Wang · Yong Lin · Tong Zhang -
2022 : Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint »
Hao Liu · Minshuo Chen · Siawpeng Er · Wenjing Liao · Tong Zhang · Tuo Zhao -
2022 : Synthetic Principle Component Design: Fast Covariate Balancing with Synthetic Controls »
Yiping Lu · Jiajin Li · Lexing Ying · Jose Blanchet -
2023 : Representation Learning for Extremes »
Ali Hasan · Yuting Ng · Jose Blanchet · Vahid Tarokh -
2023 : Benign Oscillation of Stochastic Gradient Descent with Large Learning Rate »
Miao Lu · Beining Wu · Xiaodong Yang · Difan Zou -
2023 : Accelerated Sampling of Rare Events using a Neural Network Bias Potential »
Xinru Hua · Rasool Ahmad · Jose Blanchet · Wei Cai -
2023 Poster: Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds »
Jiayi Huang · Han Zhong · Liwei Wang · Lin Yang -
2023 Poster: Double Randomized Underdamped Langevin with Dimension-Independent Convergence Guarantee »
Yuanshi Liu · Cong Fang · Tong Zhang -
2023 Poster: When can Regression-Adjusted Control Variate Help? Rare Events, Sobolev Embedding and Minimax Optimality »
Jose Blanchet · Haoxuan Chen · Yiping Lu · Lexing Ying -
2023 Poster: Posterior Sampling for Competitive RL: Function Approximation and Partial Observation »
Shuang Qiu · Ziyu Dai · Han Zhong · Zhaoran Wang · Zhuoran Yang · Tong Zhang -
2023 Poster: Corruption-Robust Offline Reinforcement Learning with General Function Approximation »
Chenlu Ye · Rui Yang · Quanquan Gu · Tong Zhang -
2023 Poster: Payoff-based Learning with Matrix Multiplicative Weights in Quantum Games »
Kyriakos Lotidis · Panayotis Mertikopoulos · Nicholas Bambos · Jose Blanchet -
2023 Poster: Universal Gradient Descent Ascent Method for Nonconvex-Nonconcave Minimax Optimization »
Taoli Zheng · Linglingzhi Zhu · Anthony Man-Cho So · Jose Blanchet · Jiajin Li -
2023 Poster: A Reduction-based Framework for Sequential Decision Making with Delayed Feedback »
Yunchang Yang · Han Zhong · Tianhao Wu · Bin Liu · Liwei Wang · Simon Du -
2023 Poster: Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration »
Zhihan Liu · Miao Lu · WEI XIONG · Han Zhong · Hao Hu · Shenao Zhang · Sirui Zheng · Zhuoran Yang · Zhaoran Wang -
2023 Poster: Inconsistency, Instability, and Generalization Gap of Deep Neural Network Training »
Rie Johnson · Tong Zhang -
2023 Poster: A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes »
Han Zhong · Tong Zhang -
2022 Spotlight: Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power »
Binghui Li · Jikai Jin · Han Zhong · John Hopcroft · Liwei Wang -
2022 Spotlight: Lightning Talks 2A-1 »
Caio Kalil Lauand · Ryan Strauss · Yasong Feng · lingyu gu · Alireza Fathollah Pour · Oren Mangoubi · Jianhao Ma · Binghui Li · Hassan Ashtiani · Yongqi Du · Salar Fattahi · Sean Meyn · Jikai Jin · Nisheeth K. Vishnoi · zengfeng Huang · Junier B Oliva · yuan zhang · Han Zhong · Tianyu Wang · John Hopcroft · Di Xie · Shiliang Pu · Liwei Wang · Robert Qiu · Zhenyu Liao -
2022 Poster: Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power »
Binghui Li · Jikai Jin · Han Zhong · John Hopcroft · Liwei Wang -
2022 Poster: Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient Descent »
Yiping Lu · Jose Blanchet · Lexing Ying -
2022 Poster: Tikhonov Regularization is Optimal Transport Robust under Martingale Constraints »
Jiajin Li · Sirui Lin · Jose Blanchet · Viet Anh Nguyen -
2022 Poster: When is the Convergence Time of Langevin Algorithms Dimension Independent? A Composite Optimization Viewpoint »
Yoav S Freund · Yi-An Ma · Tong Zhang -
2022 Poster: Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity »
Alekh Agarwal · Tong Zhang -
2022 Poster: Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions »
Jiafan He · Dongruo Zhou · Tong Zhang · Quanquan Gu -
2021 : HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo -
2021 : Statistical Numerical PDE : Fast Rate, Neural Scaling Law and When it’s Optimal »
Yiping Lu · Haoxuan Chen · Jianfeng Lu · Lexing Ying · Jose Blanchet -
2021 : HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo -
2021 Poster: Adversarial Regression with Doubly Non-negative Weighting Matrices »
Tam Le · Truyen Nguyen · Makoto Yamada · Jose Blanchet · Viet Anh Nguyen -
2021 Poster: Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs »
Han Zhong · Jiayi Huang · Lin Yang · Liwei Wang -
2021 Poster: Modified Frank Wolfe in Probability Space »
Carson Kent · Jiajin Li · Jose Blanchet · Peter W Glynn -
2020 Poster: Distributionally Robust Parametric Maximum Likelihood Estimation »
Viet Anh Nguyen · Xuhui Zhang · Jose Blanchet · Angelos Georghiou -
2020 Poster: Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality »
Nian Si · Jose Blanchet · Soumyadip Ghosh · Mark Squillante -
2020 Spotlight: Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality »
Nian Si · Jose Blanchet · Soumyadip Ghosh · Mark Squillante -
2020 Poster: Distributionally Robust Local Non-parametric Conditional Estimation »
Viet Anh Nguyen · Fan Zhang · Jose Blanchet · Erick Delage · Yinyu Ye -
2019 Poster: Learning in Generalized Linear Contextual Bandits with Stochastic Delays »
Zhengyuan Zhou · Renyuan Xu · Jose Blanchet -
2019 Spotlight: Learning in Generalized Linear Contextual Bandits with Stochastic Delays »
Zhengyuan Zhou · Renyuan Xu · Jose Blanchet -
2019 Poster: Online EXP3 Learning in Adversarial Bandits with Delayed Feedback »
Ilai Bistritz · Zhengyuan Zhou · Xi Chen · Nicholas Bambos · Jose Blanchet -
2019 Poster: Multivariate Distributionally Robust Convex Regression under Absolute Error Loss »
Jose Blanchet · Peter W Glynn · Jun Yan · Zhengqing Zhou -
2019 Poster: Semi-Parametric Dynamic Contextual Pricing »
Virag Shah · Ramesh Johari · Jose Blanchet -
2017 Poster: Diffusion Approximations for Online Principal Component Estimation and Global Convergence »
Chris Junchi Li · Mengdi Wang · Tong Zhang -
2017 Oral: Diffusion Approximations for Online Principal Component Estimation and Global Convergence »
Chris Junchi Li · Mengdi Wang · Tong Zhang -
2017 Poster: On Quadratic Convergence of DC Proximal Newton Algorithm in Nonconvex Sparse Learning »
Xingguo Li · Lin Yang · Jason Ge · Jarvis Haupt · Tong Zhang · Tuo Zhao