Timezone: »
In offline reinforcement learning (RL), one detrimental issue to policy learning is the error accumulation of deep \textit{Q} function in out-of-distribution (OOD) areas. Unfortunately, existing offline RL methods are often over-conservative, inevitably hurting generalization performance outside data distribution. In our study, one interesting observation is that deep \textit{Q} functions approximate well inside the convex hull of training data. Inspired by this, we propose a new method, \textit{DOGE (Distance-sensitive Offline RL with better GEneralization)}. DOGE marries dataset geometry with deep function approximators in offline RL, and enables exploitation in generalizable OOD areas rather than strictly constraining policy within data distribution. Specifically, DOGE trains a state-conditioned distance function that can be readily plugged into standard actor-critic methods as a policy constraint. Simple yet elegant, our algorithm enjoys better generalization compared to state-of-the-art methods on D4RL benchmarks. Theoretical analysis demonstrates the superiority of our approach to existing methods that are solely based on data distribution or support constraints.
Author Information
Li Jianxiong (Tsinghua University)
Xianyuan Zhan (Tsinghua University, Tsinghua University)
Haoran Xu (JD Technology)
Xiangyu Zhu (Tianjin University)
Jingjing Liu (Microsoft)
Ya-Qin Zhang (George Washington University)
More from the Same Authors
-
2021 : VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation »
Linjie Li · Jie Lei · Zhe Gan · Licheng Yu · Yen-Chun Chen · Rohit Pillai · Yu Cheng · Luowei Zhou · Xin Wang · William Yang Wang · Tamara L Berg · Mohit Bansal · Jingjing Liu · Lijuan Wang · Zicheng Liu -
2021 : Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations »
Haoran Xu · Xianyuan Zhan · Honglei Yin · -
2021 : Offline Reinforcement Learning with Soft Behavior Regularization »
Haoran Xu · Xianyuan Zhan · Li Jianxiong · Honglei Yin -
2021 : Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations »
Haoran Xu · Xianyuan Zhan · Honglei Yin · -
2021 : Model-Based Offline Planning with Trajectory Pruning »
Xianyuan Zhan · Xiangyu Zhu · Haoran Xu -
2022 Poster: When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning »
Haoyi Niu · shubham sharma · Yiwen Qiu · Ming Li · Guyue Zhou · Jianming HU · Xianyuan Zhan -
2022 Poster: A Policy-Guided Imitation Approach for Offline Reinforcement Learning »
Haoran Xu · Li Jiang · Li Jianxiong · Xianyuan Zhan -
2022 Poster: TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation »
Pengfei Li · Beiwen Tian · Yongliang Shi · Xiaoxue Chen · Hao Zhao · Guyue Zhou · Ya-Qin Zhang -
2022 : A Versatile and Efficient Reinforcement Learning Approach for Autonomous Driving »
Guan Wang · Haoyi Niu · desheng zhu · Jianming HU · Xianyuan Zhan · Guyue Zhou -
2022 : A Versatile and Efficient Reinforcement Learning Approach for Autonomous Driving »
Guan Wang · Haoyi Niu · desheng zhu · Jianming HU · Xianyuan Zhan · Guyue Zhou -
2022 : A Versatile and Efficient Reinforcement Learning Approach for Autonomous Driving »
Guan Wang · Haoyi Niu · desheng zhu · Jianming HU · Xianyuan Zhan · Guyue Zhou -
2022 : Sparse Q-Learning: Offline Reinforcement Learning with Implicit Value Regularization »
Haoran Xu · Li Jiang · Li Jianxiong · Zhuoran Yang · Zhaoran Wang · Xianyuan Zhan -
2022 Spotlight: When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning »
Haoyi Niu · shubham sharma · Yiwen Qiu · Ming Li · Guyue Zhou · Jianming HU · Xianyuan Zhan -
2022 Spotlight: Lightning Talks 5A-1 »
Yao Mu · Jin Zhang · Haoyi Niu · Rui Yang · Mingdong Wu · Ze Gong · shubham sharma · Chenjia Bai · Yu ("Tony") Zhang · Siyuan Li · Yuzheng Zhuang · Fangwei Zhong · Yiwen Qiu · Xiaoteng Ma · Fei Ni · Yulong Xia · Chongjie Zhang · Hao Dong · Ming Li · Zhaoran Wang · Bin Wang · Chongjie Zhang · Jianyu Chen · Guyue Zhou · Lei Han · Jianming HU · Jianye Hao · Xianyuan Zhan · Ping Luo -
2021 Poster: Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective »
Tianlong Chen · Yu Cheng · Zhe Gan · Jingjing Liu · Zhangyang Wang -
2021 Poster: The Elastic Lottery Ticket Hypothesis »
Xiaohan Chen · Yu Cheng · Shuohang Wang · Zhe Gan · Jingjing Liu · Zhangyang Wang