Timezone: »
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution generalization but suffer from erroneous off-policy evaluation. Imitation-based methods avoid off-policy evaluation but are too conservative to surpass the dataset. In this study, we propose an alternative approach, inheriting the training stability of imitation-style methods while still allowing logical out-of-distribution generalization. We decompose the conventional reward-maximizing policy in offline RL into a guide-policy and an execute-policy. During training, the guide-poicy and execute-policy are learned using only data from the dataset, in a supervised and decoupled manner. During evaluation, the guide-policy guides the execute-policy by telling where it should go so that the reward can be maximized, serving as the \textit{Prophet}. By doing so, our algorithm allows \textit{state-compositionality} from the dataset, rather than \textit{action-compositionality} conducted in prior imitation-style methods. We dumb this new approach Policy-guided Offline RL (\texttt{POR}). \texttt{POR} demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline RL. We also highlight the benefits of \texttt{POR} in terms of improving with supplementary suboptimal data and easily adapting to new tasks by only changing the guide-poicy.
Author Information
Haoran Xu (JD Technology)
Li Jiang (Tsinghua University)
Li Jianxiong (Tsinghua University)
Xianyuan Zhan (Tsinghua University, Tsinghua University)
More from the Same Authors
-
2021 : Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations »
Haoran Xu · Xianyuan Zhan · Honglei Yin · -
2021 : Offline Reinforcement Learning with Soft Behavior Regularization »
Haoran Xu · Xianyuan Zhan · Li Jianxiong · Honglei Yin -
2021 : Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations »
Haoran Xu · Xianyuan Zhan · Honglei Yin · -
2021 : Model-Based Offline Planning with Trajectory Pruning »
Xianyuan Zhan · Xiangyu Zhu · Haoran Xu -
2022 Poster: When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning »
Haoyi Niu · Shubham Sharma · Yiwen Qiu · Ming Li · Guyue Zhou · Jianming HU · Xianyuan Zhan -
2022 : A Versatile and Efficient Reinforcement Learning Approach for Autonomous Driving »
Guan Wang · Haoyi Niu · desheng zhu · Jianming HU · Xianyuan Zhan · Guyue Zhou -
2022 : A Versatile and Efficient Reinforcement Learning Approach for Autonomous Driving »
Guan Wang · Haoyi Niu · desheng zhu · Jianming HU · Xianyuan Zhan · Guyue Zhou -
2022 : A Versatile and Efficient Reinforcement Learning Approach for Autonomous Driving »
Guan Wang · Haoyi Niu · desheng zhu · Jianming HU · Xianyuan Zhan · Guyue Zhou -
2022 : Sparse Q-Learning: Offline Reinforcement Learning with Implicit Value Regularization »
Haoran Xu · Li Jiang · Li Jianxiong · Zhuoran Yang · Zhaoran Wang · Xianyuan Zhan -
2022 : Distance-Sensitive Offline Reinforcement Learning »
Li Jianxiong · Xianyuan Zhan · Haoran Xu · Xiangyu Zhu · Jingjing Liu · Ya-Qin Zhang -
2023 Poster: Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL »
Peng Cheng · Xianyuan Zhan · zhihao wu · Wenjia Zhang · Youfang Lin · Shou cheng Song · Han Wang · Li Jiang -
2023 Poster: Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization »
Xiangsen Wang · Haoran Xu · Yinan Zheng · Xianyuan Zhan -
2022 Spotlight: When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning »
Haoyi Niu · Shubham Sharma · Yiwen Qiu · Ming Li · Guyue Zhou · Jianming HU · Xianyuan Zhan -
2022 Spotlight: Lightning Talks 5A-1 »
Yao Mu · Jin Zhang · Haoyi Niu · Rui Yang · Mingdong Wu · Ze Gong · Shubham Sharma · Chenjia Bai · Yu ("Tony") Zhang · Siyuan Li · Yuzheng Zhuang · Fangwei Zhong · Yiwen Qiu · Xiaoteng Ma · Fei Ni · Yulong Xia · Chongjie Zhang · Hao Dong · Ming Li · Zhaoran Wang · Bin Wang · Chongjie Zhang · Jianyu Chen · Guyue Zhou · Lei Han · Jianming HU · Jianye Hao · Xianyuan Zhan · Ping Luo