Timezone: »
Offline reinforcement learning (RL) enables learning policies using pre-collected datasets without environment interaction, which provides a promising direction to make RL usable in real-world systems. Although recent offline RL studies have achieved much progress, existing methods still face many practical challenges in real-world system control tasks, such as computational restriction during agent training and the requirement of extra control flexibility. Model-based planning framework provides an attractive solution for such tasks. However, most model-based planning algorithms are not designed for offline settings. Simply combining the ingredients of offline RL with existing methods either provides over-restrictive planning or leads to inferior performance. We propose a new light-weighted model-based offline planning framework, namely MOPP, which tackles the dilemma between the restrictions of offline learning and high-performance planning. MOPP encourages more aggressive trajectory rollout guided by the behavior policy learned from data, and prunes out problematic trajectories to avoid potential out-of-distribution samples. Experimental results show that MOPP provides competitive performance compared with existing model-based offline planning and RL approaches.
Author Information
Xianyuan Zhan (Tsinghua University, Tsinghua University)
Xiangyu Zhu (Tianjin University)
Haoran Xu (JD Technology)
More from the Same Authors
-
2021 : Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations »
Haoran Xu · Xianyuan Zhan · Honglei Yin · -
2021 : Offline Reinforcement Learning with Soft Behavior Regularization »
Haoran Xu · Xianyuan Zhan · Li Jianxiong · Honglei Yin -
2021 : Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations »
Haoran Xu · Xianyuan Zhan · Honglei Yin · -
2022 Poster: When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning »
Haoyi Niu · Shubham Sharma · Yiwen Qiu · Ming Li · Guyue Zhou · Jianming HU · Xianyuan Zhan -
2022 Poster: A Policy-Guided Imitation Approach for Offline Reinforcement Learning »
Haoran Xu · Li Jiang · Li Jianxiong · Xianyuan Zhan -
2022 : A Versatile and Efficient Reinforcement Learning Approach for Autonomous Driving »
Guan Wang · Haoyi Niu · desheng zhu · Jianming HU · Xianyuan Zhan · Guyue Zhou -
2022 : A Versatile and Efficient Reinforcement Learning Approach for Autonomous Driving »
Guan Wang · Haoyi Niu · desheng zhu · Jianming HU · Xianyuan Zhan · Guyue Zhou -
2022 : A Versatile and Efficient Reinforcement Learning Approach for Autonomous Driving »
Guan Wang · Haoyi Niu · desheng zhu · Jianming HU · Xianyuan Zhan · Guyue Zhou -
2022 : Sparse Q-Learning: Offline Reinforcement Learning with Implicit Value Regularization »
Haoran Xu · Li Jiang · Li Jianxiong · Zhuoran Yang · Zhaoran Wang · Xianyuan Zhan -
2022 : Distance-Sensitive Offline Reinforcement Learning »
Li Jianxiong · Xianyuan Zhan · Haoran Xu · Xiangyu Zhu · Jingjing Liu · Ya-Qin Zhang -
2023 Poster: Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL »
Peng Cheng · Xianyuan Zhan · zhihao wu · Wenjia Zhang · Youfang Lin · Shou cheng Song · Han Wang · Li Jiang -
2023 Poster: Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization »
Xiangsen Wang · Haoran Xu · Yinan Zheng · Xianyuan Zhan -
2022 Spotlight: When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning »
Haoyi Niu · Shubham Sharma · Yiwen Qiu · Ming Li · Guyue Zhou · Jianming HU · Xianyuan Zhan -
2022 Spotlight: Lightning Talks 5A-1 »
Yao Mu · Jin Zhang · Haoyi Niu · Rui Yang · Mingdong Wu · Ze Gong · Shubham Sharma · Chenjia Bai · Yu ("Tony") Zhang · Siyuan Li · Yuzheng Zhuang · Fangwei Zhong · Yiwen Qiu · Xiaoteng Ma · Fei Ni · Yulong Xia · Chongjie Zhang · Hao Dong · Ming Li · Zhaoran Wang · Bin Wang · Chongjie Zhang · Jianyu Chen · Guyue Zhou · Lei Han · Jianming HU · Jianye Hao · Xianyuan Zhan · Ping Luo