Timezone: »
We study the problem of offline Imitation Learning (IL) where an agent aims to learn an optimal expert behavior policy without additional online environment interactions. Instead, the agent is provided with a static offline dataset of state-action-next state transition triples from both optimal and non-optimal expert behaviors. This strictly offline imitation learning problem arises in many real-world problems, where environment interactions and expert annotations are costly. Prior works that address the problem either require that expert data occupies the majority proportion of the offline dataset, or need to learn a reward function and perform offline reinforcement learning (RL) based on the learned reward function. In this paper, we propose an imitation learning algorithm to address the problem without additional steps of reward learning and offline RL training for the case when demonstrations containing large-proportion of suboptimal data. Built upon behavioral cloning (BC), we introduce an additional discriminator to distinguish expert and non-expert data, we propose a cooperation strategy to boost the performance of both tasks, this will result in a new policy learning objective and surprisingly, we find its equivalence to a generalized BC objective, where the outputs of discriminator serve as the weights of the BC loss function. Experimental results show that the proposed algorithm can learn behavior policies that are much closer to the optimal policies than policies learned by baseline algorithms.
Author Information
Haoran Xu (JD Technology)
Xianyuan Zhan (Tsinghua University, Tsinghua University)
Honglei Yin (JD Technology)
More from the Same Authors
-
2021 : Offline Reinforcement Learning with Soft Behavior Regularization »
Haoran Xu · Xianyuan Zhan · Li Jianxiong · Honglei Yin -
2021 : Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations »
Haoran Xu · Xianyuan Zhan · Honglei Yin · -
2021 : Model-Based Offline Planning with Trajectory Pruning »
Xianyuan Zhan · Xiangyu Zhu · Haoran Xu -
2022 Poster: When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning »
Haoyi Niu · Shubham Sharma · Yiwen Qiu · Ming Li · Guyue Zhou · Jianming HU · Xianyuan Zhan -
2022 Poster: A Policy-Guided Imitation Approach for Offline Reinforcement Learning »
Haoran Xu · Li Jiang · Li Jianxiong · Xianyuan Zhan -
2022 : A Versatile and Efficient Reinforcement Learning Approach for Autonomous Driving »
Guan Wang · Haoyi Niu · desheng zhu · Jianming HU · Xianyuan Zhan · Guyue Zhou -
2022 : A Versatile and Efficient Reinforcement Learning Approach for Autonomous Driving »
Guan Wang · Haoyi Niu · desheng zhu · Jianming HU · Xianyuan Zhan · Guyue Zhou -
2022 : A Versatile and Efficient Reinforcement Learning Approach for Autonomous Driving »
Guan Wang · Haoyi Niu · desheng zhu · Jianming HU · Xianyuan Zhan · Guyue Zhou -
2022 : Sparse Q-Learning: Offline Reinforcement Learning with Implicit Value Regularization »
Haoran Xu · Li Jiang · Li Jianxiong · Zhuoran Yang · Zhaoran Wang · Xianyuan Zhan -
2022 : Distance-Sensitive Offline Reinforcement Learning »
Li Jianxiong · Xianyuan Zhan · Haoran Xu · Xiangyu Zhu · Jingjing Liu · Ya-Qin Zhang -
2023 Poster: Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL »
Peng Cheng · Xianyuan Zhan · zhihao wu · Wenjia Zhang · Youfang Lin · Shou cheng Song · Han Wang · Li Jiang -
2023 Poster: Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization »
Xiangsen Wang · Haoran Xu · Yinan Zheng · Xianyuan Zhan -
2022 Spotlight: When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning »
Haoyi Niu · Shubham Sharma · Yiwen Qiu · Ming Li · Guyue Zhou · Jianming HU · Xianyuan Zhan -
2022 Spotlight: Lightning Talks 5A-1 »
Yao Mu · Jin Zhang · Haoyi Niu · Rui Yang · Mingdong Wu · Ze Gong · Shubham Sharma · Chenjia Bai · Yu ("Tony") Zhang · Siyuan Li · Yuzheng Zhuang · Fangwei Zhong · Yiwen Qiu · Xiaoteng Ma · Fei Ni · Yulong Xia · Chongjie Zhang · Hao Dong · Ming Li · Zhaoran Wang · Bin Wang · Chongjie Zhang · Jianyu Chen · Guyue Zhou · Lei Han · Jianming HU · Jianye Hao · Xianyuan Zhan · Ping Luo