Timezone: »
Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.
Author Information
Rui Yang (Hong Kong University of Science and Technology)
I’m a first year Ph.D. student at CSE, the Hong Kong University of Science and Technology, supervised by Prof. Tong Zhang. I received my master’s degree and bachelor’s degree from the Department of Automation at Tsinghua University. My research interests lie in deep reinforcement learning (RL), especially goal-conditioned RL, offline RL and model-based RL. I’m also interested in the application of RL algorithms to game AI and robotics.
Chenjia Bai (Shanghai AI Laboratory)
Xiaoteng Ma (Department of Automation, Tsinghua University)
Zhaoran Wang (Northwestern University)
Chongjie Zhang (Tsinghua University)
Lei Han (Tencent AI Lab)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: RORL: Robust Offline Reinforcement Learning via Conservative Smoothing »
Dates n/a. Room
More from the Same Authors
-
2021 Spotlight: Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning »
Yiqin Yang · Xiaoteng Ma · Chenghao Li · Zewu Zheng · Qiyuan Zhang · Gao Huang · Jun Yang · Qianchuan Zhao -
2021 : MHER: Model-based Hindsight Experience Replay »
Yang Rui · Meng Fang · Lei Han · Yali Du · Feng Luo · Xiu Li -
2022 Poster: Mildly Conservative Q-Learning for Offline Reinforcement Learning »
Jiafei Lyu · Xiaoteng Ma · Xiu Li · Zongqing Lu -
2022 : Sparse Q-Learning: Offline Reinforcement Learning with Implicit Value Regularization »
Haoran Xu · Li Jiang · Li Jianxiong · Zhuoran Yang · Zhaoran Wang · Xianyuan Zhan -
2022 : Multi-Agent Policy Transfer via Task Relationship Modeling »
Rong-Jun Qin · Feng Chen · Tonghan Wang · Lei Yuan · Xiaoran Wu · Yipeng Kang · Zongzhang Zhang · Chongjie Zhang · Yang Yu -
2022 : Model and Method: Training-Time Attack for Cooperative Multi-Agent Reinforcement Learning »
Siyang Wu · Tonghan Wang · Xiaoran Wu · Jingfeng ZHANG · Yujing Hu · Changjie Fan · Chongjie Zhang -
2022 Spotlight: Mildly Conservative Q-Learning for Offline Reinforcement Learning »
Jiafei Lyu · Xiaoteng Ma · Xiu Li · Zongqing Lu -
2022 Spotlight: CUP: Critic-Guided Policy Reuse »
Jin Zhang · Siyuan Li · Chongjie Zhang -
2022 Spotlight: Lightning Talks 5A-1 »
Yao Mu · Jin Zhang · Haoyi Niu · Rui Yang · Mingdong Wu · Ze Gong · Shubham Sharma · Chenjia Bai · Yu ("Tony") Zhang · Siyuan Li · Yuzheng Zhuang · Fangwei Zhong · Yiwen Qiu · Xiaoteng Ma · Fei Ni · Yulong Xia · Chongjie Zhang · Hao Dong · Ming Li · Zhaoran Wang · Bin Wang · Chongjie Zhang · Jianyu Chen · Guyue Zhou · Lei Han · Jianming HU · Jianye Hao · Xianyuan Zhan · Ping Luo -
2022 Spotlight: Non-Linear Coordination Graphs »
Yipeng Kang · Tonghan Wang · Qianlan Yang · Chongjie Zhang -
2022 Poster: Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence »
Boyi Liu · Jiayang Li · Zhuoran Yang · Hoi-To Wai · Mingyi Hong · Yu Nie · Zhaoran Wang -
2022 Poster: A Unifying Framework of Off-Policy General Value Function Evaluation »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin Liang -
2022 Poster: Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL »
Fengzhuo Zhang · Boyi Liu · Kaixin Wang · Vincent Tan · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets »
Yifei Min · Tianhao Wang · Ruitu Xu · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Poster: Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping »
Hao Sun · Lei Han · Rui Yang · Xiaoteng Ma · Jian Guo · Bolei Zhou -
2022 Poster: Exponential Family Model-Based Reinforcement Learning via Score Matching »
Gene Li · Junbo Li · Anmol Kabra · Nati Srebro · Zhaoran Wang · Zhuoran Yang -
2022 Poster: FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning »
Xiao-Yang Liu · Ziyi Xia · Jingyang Rui · Jiechao Gao · Hongyang Yang · Ming Zhu · Christina Wang · Zhaoran Wang · Jian Guo -
2021 Poster: Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning »
Yiqin Yang · Xiaoteng Ma · Chenghao Li · Zewu Zheng · Qiyuan Zhang · Gao Huang · Jun Yang · Qianchuan Zhao -
2021 Poster: Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration »
Lulu Zheng · Jiarui Chen · Jianhao Wang · Jiamin He · Yujing Hu · Yingfeng Chen · Changjie Fan · Yang Gao · Chongjie Zhang -
2021 Poster: On the Estimation Bias in Double Q-Learning »
Zhizhou Ren · Guangxiang Zhu · Hao Hu · Beining Han · Jianglun Chen · Chongjie Zhang -
2021 Poster: Model-Based Reinforcement Learning via Imagination with Derived Memory »
Yao Mu · Yuzheng Zhuang · Bin Wang · Guangxiang Zhu · Wulong Liu · Jianyu Chen · Ping Luo · Shengbo Li · Chongjie Zhang · Jianye Hao -
2021 Poster: Offline Reinforcement Learning with Reverse Model-based Imagination »
Jianhao Wang · Wenzhe Li · Haozhe Jiang · Guangxiang Zhu · Siyuan Li · Chongjie Zhang -
2021 Poster: Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization »
Jianhao Wang · Zhizhou Ren · Beining Han · Jianing Ye · Chongjie Zhang -
2021 Poster: Celebrating Diversity in Shared Multi-Agent Reinforcement Learning »
Chenghao Li · Tonghan Wang · Chengjie Wu · Qianchuan Zhao · Jun Yang · Chongjie Zhang -
2021 Poster: Dynamic Bottleneck for Robust Self-Supervised Exploration »
Chenjia Bai · Lingxiao Wang · Lei Han · Animesh Garg · Jianye Hao · Peng Liu · Zhaoran Wang -
2020 Poster: Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework »
Wanxin Jin · Zhaoran Wang · Zhuoran Yang · Shaoshuai Mou -
2020 Poster: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang -
2020 Oral: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang -
2020 Poster: Provably Efficient Neural GTD for Off-Policy Learning »
Hoi-To Wai · Zhuoran Yang · Zhaoran Wang · Mingyi Hong -
2020 Poster: End-to-End Learning and Intervention in Games »
Jiayang Li · Jing Yu · Yu Nie · Zhaoran Wang -
2020 Poster: Dynamic Regret of Policy Optimization in Non-Stationary Environments »
Yingjie Fei · Zhuoran Yang · Zhaoran Wang · Qiaomin Xie -
2020 Poster: On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces »
Zhuoran Yang · Chi Jin · Zhaoran Wang · Mengdi Wang · Michael Jordan -
2020 Poster: Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss »
Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jieping Ye · Zhaoran Wang -
2020 Poster: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2020 Poster: Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning »
Guangxiang Zhu · Minghao Zhang · Honglak Lee · Chongjie Zhang -
2020 Spotlight: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 Poster: Statistical-Computational Tradeoff in Single Index Models »
Lingxiao Wang · Zhuoran Yang · Zhaoran Wang -
2019 Poster: Curriculum-guided Hindsight Experience Replay »
Meng Fang · Tianyi Zhou · Yali Du · Lei Han · Zhengyou Zhang -
2019 Poster: Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost »
Zhuoran Yang · Yongxin Chen · Mingyi Hong · Zhaoran Wang -
2019 Poster: Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards »
Siyuan Li · Rui Wang · Minxue Tang · Chongjie Zhang -
2019 Poster: Variance Reduced Policy Evaluation with Smooth Function Approximation »
Hoi-To Wai · Mingyi Hong · Zhuoran Yang · Zhaoran Wang · Kexin Tang -
2019 Poster: Convergent Policy Optimization for Safe Reinforcement Learning »
Ming Yu · Zhuoran Yang · Mladen Kolar · Zhaoran Wang -
2019 Poster: LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning »
Yali Du · Lei Han · Meng Fang · Ji Liu · Tianhong Dai · Dacheng Tao -
2018 Poster: Contrastive Learning from Pairwise Measurements »
Yi Chen · Zhuoran Yang · Yuchen Xie · Zhaoran Wang -
2018 Poster: Provable Gaussian Embedding with One Observation »
Ming Yu · Zhuoran Yang · Tuo Zhao · Mladen Kolar · Zhaoran Wang -
2018 Poster: Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization »
Hoi-To Wai · Zhuoran Yang · Zhaoran Wang · Mingyi Hong -
2018 Poster: Object-Oriented Dynamics Predictor »
Guangxiang Zhu · Zhiao Huang · Chongjie Zhang -
2017 Poster: Estimating High-dimensional Non-Gaussian Multiple Index Models via Stein’s Lemma »
Zhuoran Yang · Krishnakumar Balasubramanian · Zhaoran Wang · Han Liu