Timezone: »
Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.
Author Information
Rui Yang (Hong Kong University of Science and Technology)
I’m a first year Ph.D. student at CSE, the Hong Kong University of Science and Technology, supervised by Prof. Tong Zhang. I received my master’s degree and bachelor’s degree from the Department of Automation at Tsinghua University. My research interests lie in deep reinforcement learning (RL), especially goal-conditioned RL, offline RL and model-based RL. I’m also interested in the application of RL algorithms to game AI and robotics.
Chenjia Bai (Shanghai AI Laboratory)
Xiaoteng Ma (Department of Automation, Tsinghua University)
Zhaoran Wang (Northwestern University)
Chongjie Zhang (Tsinghua University)
Lei Han (Tencent AI Lab)
More from the Same Authors
-
2021 Spotlight: Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning »
Yiqin Yang · Xiaoteng Ma · Chenghao Li · Zewu Zheng · Qiyuan Zhang · Gao Huang · Jun Yang · Qianchuan Zhao -
2021 : MHER: Model-based Hindsight Experience Replay »
Yang Rui · Meng Fang · Lei Han · Yali Du · Feng Luo · Xiu Li -
2022 Poster: LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning »
Xi Chen · Ali Ghadirzadeh · Tianhe Yu · Jianhao Wang · Alex Yuan Gao · Wenzhe Li · Liang Bin · Chelsea Finn · Chongjie Zhang -
2022 Poster: Mildly Conservative Q-Learning for Offline Reinforcement Learning »
Jiafei Lyu · Xiaoteng Ma · Xiu Li · Zongqing Lu -
2022 : Sparse Q-Learning: Offline Reinforcement Learning with Implicit Value Regularization »
Haoran Xu · Li Jiang · Li Jianxiong · Zhuoran Yang · Zhaoran Wang · Xianyuan Zhan -
2023 Poster: Cross-Domain Policy Adaptation via Value-Guided Data Filtering »
Kang Xu · Chenjia Bai · Xiaoteng Ma · Dong Wang · Bin Zhao · Zhen Wang · Xuelong Li · Wei Li -
2023 Poster: Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning »
Haoran He · Chenjia Bai · Kang Xu · Zhuoran Yang · Weinan Zhang · Dong Wang · Bin Zhao · Xuelong Li -
2023 Poster: Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms »
Shenao Zhang · Boyi Liu · Zhaoran Wang · Tuo Zhao -
2023 Poster: Learning Regularized Monotone Graphon Mean-Field Games »
Fengzhuo Zhang · Vincent Tan · Zhaoran Wang · Zhuoran Yang -
2023 Poster: Posterior Sampling for Competitive RL: Function Approximation and Partial Observation »
Shuang Qiu · Ziyu Dai · Han Zhong · Zhaoran Wang · Zhuoran Yang · Tong Zhang -
2023 Poster: Corruption-Robust Offline Reinforcement Learning with General Function Approximation »
Chenlu Ye · Rui Yang · Quanquan Gu · Tong Zhang -
2023 Poster: One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration »
Zhihan Liu · Miao Lu · WEI XIONG · Han Zhong · Hao Hu · Shenao Zhang · Sirui Zheng · Zhuoran Yang · Zhaoran Wang -
2023 Poster: MeGraph: Capturing Long-Range Interactions by Alternating Local and Hierarchical Aggregation on Multi-Scaled Graph Hierarchy »
Honghua Dong · Jiawei Xu · Yu Yang · Rui Zhao · Shiwen Wu · Chun Yuan · Xiu Li · Chris Maddison · Lei Han -
2022 Spotlight: Mildly Conservative Q-Learning for Offline Reinforcement Learning »
Jiafei Lyu · Xiaoteng Ma · Xiu Li · Zongqing Lu -
2022 Spotlight: RORL: Robust Offline Reinforcement Learning via Conservative Smoothing »
Rui Yang · Chenjia Bai · Xiaoteng Ma · Zhaoran Wang · Chongjie Zhang · Lei Han -
2022 Spotlight: Lightning Talks 5A-1 »
Yao Mu · Jin Zhang · Haoyi Niu · Rui Yang · Mingdong Wu · Ze Gong · Shubham Sharma · Chenjia Bai · Yu ("Tony") Zhang · Siyuan Li · Yuzheng Zhuang · Fangwei Zhong · Yiwen Qiu · Xiaoteng Ma · Fei Ni · Yulong Xia · Chongjie Zhang · Hao Dong · Ming Li · Zhaoran Wang · Bin Wang · Chongjie Zhang · Jianyu Chen · Guyue Zhou · Lei Han · Jianming HU · Jianye Hao · Xianyuan Zhan · Ping Luo -
2022 Poster: Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence »
Boyi Liu · Jiayang Li · Zhuoran Yang · Hoi-To Wai · Mingyi Hong · Yu Nie · Zhaoran Wang -
2022 Poster: A Unifying Framework of Off-Policy General Value Function Evaluation »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin Liang -
2022 Poster: Safe Opponent-Exploitation Subgame Refinement »
Mingyang Liu · Chengjie Wu · Qihan Liu · Yansen Jing · Jun Yang · Pingzhong Tang · Chongjie Zhang -
2022 Poster: Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL »
Fengzhuo Zhang · Boyi Liu · Kaixin Wang · Vincent Tan · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets »
Yifei Min · Tianhao Wang · Ruitu Xu · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Poster: Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping »
Hao Sun · Lei Han · Rui Yang · Xiaoteng Ma · Jian Guo · Bolei Zhou -
2022 Poster: Low-Rank Modular Reinforcement Learning via Muscle Synergy »
Heng Dong · Tonghan Wang · Jiayuan Liu · Chongjie Zhang -
2022 Poster: Non-Linear Coordination Graphs »
Yipeng Kang · Tonghan Wang · Qianlan Yang · Xiaoran Wu · Chongjie Zhang -
2022 Poster: Exponential Family Model-Based Reinforcement Learning via Score Matching »
Gene Li · Junbo Li · Anmol Kabra · Nati Srebro · Zhaoran Wang · Zhuoran Yang -
2022 Poster: CUP: Critic-Guided Policy Reuse »
Jin Zhang · Siyuan Li · Chongjie Zhang -
2022 Poster: FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning »
Xiao-Yang Liu · Ziyi Xia · Jingyang Rui · Jiechao Gao · Hongyang Yang · Ming Zhu · Christina Wang · Zhaoran Wang · Jian Guo -
2021 Poster: Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning »
Yiqin Yang · Xiaoteng Ma · Chenghao Li · Zewu Zheng · Qiyuan Zhang · Gao Huang · Jun Yang · Qianchuan Zhao -
2021 Poster: Dynamic Bottleneck for Robust Self-Supervised Exploration »
Chenjia Bai · Lingxiao Wang · Lei Han · Animesh Garg · Jianye Hao · Peng Liu · Zhaoran Wang -
2020 Poster: Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework »
Wanxin Jin · Zhaoran Wang · Zhuoran Yang · Shaoshuai Mou -
2020 Poster: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang -
2020 Oral: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang -
2020 Poster: Provably Efficient Neural GTD for Off-Policy Learning »
Hoi-To Wai · Zhuoran Yang · Zhaoran Wang · Mingyi Hong -
2020 Poster: End-to-End Learning and Intervention in Games »
Jiayang Li · Jing Yu · Yu Nie · Zhaoran Wang -
2020 Poster: Dynamic Regret of Policy Optimization in Non-Stationary Environments »
Yingjie Fei · Zhuoran Yang · Zhaoran Wang · Qiaomin Xie -
2020 Poster: On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces »
Zhuoran Yang · Chi Jin · Zhaoran Wang · Mengdi Wang · Michael Jordan -
2020 Poster: Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss »
Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jieping Ye · Zhaoran Wang -
2020 Poster: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2020 Spotlight: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 Poster: Statistical-Computational Tradeoff in Single Index Models »
Lingxiao Wang · Zhuoran Yang · Zhaoran Wang -
2019 Poster: Curriculum-guided Hindsight Experience Replay »
Meng Fang · Tianyi Zhou · Yali Du · Lei Han · Zhengyou Zhang -
2019 Poster: Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost »
Zhuoran Yang · Yongxin Chen · Mingyi Hong · Zhaoran Wang -
2019 Poster: Variance Reduced Policy Evaluation with Smooth Function Approximation »
Hoi-To Wai · Mingyi Hong · Zhuoran Yang · Zhaoran Wang · Kexin Tang -
2019 Poster: Convergent Policy Optimization for Safe Reinforcement Learning »
Ming Yu · Zhuoran Yang · Mladen Kolar · Zhaoran Wang -
2019 Poster: LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning »
Yali Du · Lei Han · Meng Fang · Ji Liu · Tianhong Dai · Dacheng Tao -
2018 Poster: Contrastive Learning from Pairwise Measurements »
Yi Chen · Zhuoran Yang · Yuchen Xie · Zhaoran Wang -
2018 Poster: Provable Gaussian Embedding with One Observation »
Ming Yu · Zhuoran Yang · Tuo Zhao · Mladen Kolar · Zhaoran Wang -
2018 Poster: Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization »
Hoi-To Wai · Zhuoran Yang · Zhaoran Wang · Mingyi Hong -
2017 Poster: Estimating High-dimensional Non-Gaussian Multiple Index Models via Stein’s Lemma »
Zhuoran Yang · Krishnakumar Balasubramanian · Zhaoran Wang · Han Liu