Timezone: »
Offline reinforcement learning (RL) tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment. Despite the potential to surpass the behavioral policies, RL-based methods are generally impractical due to the training instability and bootstrapping the extrapolation errors, which always require careful hyperparameter tuning via online evaluation. In contrast, offline imitation learning (IL) has no such issues since it learns the policy directly without estimating the value function by bootstrapping. However, IL is usually limited in the capability of the behavioral policy and tends to learn a mediocre behavior from the dataset collected by the mixture of policies. In this paper, we aim to take advantage of IL but mitigate such a drawback. Observing that behavior cloning is able to imitate neighboring policies with less data, we propose \textit{Curriculum Offline Imitation Learning (COIL)}, which utilizes an experience picking strategy to make the agent imitate from adaptive neighboring policies with a higher return, and improves the current policy along curriculum stages. On continuous control benchmarks, we compare COIL against both imitation-based methods and RL-based methods, showing that COIL not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
Author Information
Minghuan Liu (Shanghai Jiao Tong University)
Hanye Zhao (Shanghai Jiao Tong University, Tsinghua University)
Zhengyu Yang (Shanghai Jiao Tong University)
Jian Shen (Shanghai Jiao Tong University)
Weinan Zhang (Shanghai Jiao Tong University)
Li Zhao (Microsoft Research)
Tie-Yan Liu (Microsoft Research Asia)
More from the Same Authors
-
2022 Poster: Learning Enhanced Representation for Tabular Data via Neighborhood Propagation »
Kounianhua Du · Weinan Zhang · Ruiwen Zhou · Yangkun Wang · Xilong Zhao · Jiarui Jin · Quan Gan · Zheng Zhang · David P Wipf -
2022 Poster: Quantized Training of Gradient Boosting Decision Trees »
Yu Shi · Guolin Ke · Zhuoming Chen · Shuxin Zheng · Tie-Yan Liu -
2022 Poster: An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context »
Xiaoyu Chen · Xiangming Zhu · Yufeng Zheng · Pushi Zhang · Li Zhao · Wenxue Cheng · Peng CHENG · Yongqiang Xiong · Tao Qin · Jianyu Chen · Tie-Yan Liu -
2022 Poster: Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret »
Jiawei Huang · Li Zhao · Tao Qin · Wei Chen · Nan Jiang · Tie-Yan Liu -
2022 : Multi-Agent Reinforcement Learning with Shared Resources for Inventory Management »
Yuandong Ding · Mingxiao Feng · Guozi Liu · Wei Jiang · Chuheng Zhang · Li Zhao · Lei Song · Houqiang Li · Yan Jin · Jiang Bian -
2022 : Multi-Agent Reinforcement Learning with Shared Resources for Inventory Management »
Yuandong Ding · Mingxiao Feng · Guozi Liu · Wei Jiang · Chuheng Zhang · Li Zhao · Lei Song · Houqiang Li · Yan Jin · Jiang Bian -
2022 : Visual Imitation Learning with Patch Rewards »
Minghuan Liu · Tairan He · Weinan Zhang · Shuicheng Yan · Zhongwen Xu -
2022 : Planning Immediate Landmarks of Targets for Model-Free Skill Transfer across Agents »
Minghuan Liu · Zhengbang Zhu · Menghui Zhu · Yuzheng Zhuang · Weinan Zhang · Jianye Hao -
2022 Spotlight: Lightning Talks 6B-4 »
Junjie Chen · Chuanxia Zheng · JINLONG LI · Yu Shi · Shichao Kan · Yu Wang · Fermín Travi · Ninh Pham · Lei Chai · Guobing Gan · Tung-Long Vuong · Gonzalo Ruarte · Tao Liu · Li Niu · Jingjing Zou · Zequn Jie · Peng Zhang · Ming LI · Yixiong Liang · Guolin Ke · Jianfei Cai · Gaston Bujia · Sunzhu Li · Siyuan Zhou · Jingyang Lin · Xu Wang · Min Li · Zhuoming Chen · Qing Ling · Xiaolin Wei · Xiuqing Lu · Shuxin Zheng · Dinh Phung · Yigang Cen · Jianlou Si · Juan Esteban Kamienkowski · Jianxin Wang · Chen Qian · Lin Ma · Benyou Wang · Yingwei Pan · Tie-Yan Liu · Liqing Zhang · Zhihai He · Ting Yao · Tao Mei -
2022 Spotlight: Lightning Talks 6A-2 »
Yichuan Mo · Botao Yu · Gang Li · Zezhong Xu · Haoran Wei · Arsene Fansi Tchango · Raef Bassily · Haoyu Lu · Qi Zhang · Songming Liu · Mingyu Ding · Peiling Lu · Yifei Wang · Xiang Li · Dongxian Wu · Ping Guo · Wen Zhang · Hao Zhongkai · Mehryar Mohri · Rishab Goel · Yisen Wang · Yifei Wang · Yangguang Zhu · Zhi Wen · Ananda Theertha Suresh · Chengyang Ying · Yujie Wang · Peng Ye · Rui Wang · Nanyi Fei · Hui Chen · Yiwen Guo · Wei Hu · Chenglong Liu · Julien Martel · Yuqi Huo · Wu Yichao · Hang Su · Yisen Wang · Peng Wang · Huajun Chen · Xu Tan · Jun Zhu · Ding Liang · Zhiwu Lu · Joumana Ghosn · Shanshan Zhang · Wei Ye · Ze Cheng · Shikun Zhang · Tao Qin · Tie-Yan Liu -
2022 Spotlight: Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation »
Botao Yu · Peiling Lu · Rui Wang · Wei Hu · Xu Tan · Wei Ye · Shikun Zhang · Tao Qin · Tie-Yan Liu -
2022 Spotlight: Quantized Training of Gradient Boosting Decision Trees »
Yu Shi · Guolin Ke · Zhuoming Chen · Shuxin Zheng · Tie-Yan Liu -
2022 Spotlight: Lightning Talks 4B-4 »
Ziyue Jiang · Zeeshan Khan · Yuxiang Yang · Chenze Shao · Yichong Leng · Zehao Yu · Wenguan Wang · Xian Liu · Zehua Chen · Yang Feng · Qianyi Wu · James Liang · C.V. Jawahar · Junjie Yang · Zhe Su · Songyou Peng · Yufei Xu · Junliang Guo · Michael Niemeyer · Hang Zhou · Zhou Zhao · Makarand Tapaswi · Dongfang Liu · Qian Yang · Torsten Sattler · Yuanqi Du · Haohe Liu · Jing Zhang · Andreas Geiger · Yi Ren · Long Lan · Jiawei Chen · Wayne Wu · Dahua Lin · Dacheng Tao · Xu Tan · Jinglin Liu · Ziwei Liu · 振辉 叶 · Danilo Mandic · Lei He · Xiangyang Li · Tao Qin · sheng zhao · Tie-Yan Liu -
2022 Spotlight: Lightning Talks 4A-3 »
Zhihan Gao · Yabin Wang · Xingyu Qu · Luziwei Leng · Mingqing Xiao · Bohan Wang · Yu Shen · Zhiwu Huang · Xingjian Shi · Qi Meng · Yupeng Lu · Diyang Li · Qingyan Meng · Kaiwei Che · Yang Li · Hao Wang · Huishuai Zhang · Zongpeng Zhang · Kaixuan Zhang · Xiaopeng Hong · Xiaohan Zhao · Di He · Jianguo Zhang · Yaofeng Tu · Bin Gu · Yi Zhu · Ruoyu Sun · Yuyang (Bernie) Wang · Zhouchen Lin · Qinghu Meng · Wei Chen · Wentao Zhang · Bin CUI · Jie Cheng · Zhi-Ming Ma · Mu Li · Qinghai Guo · Dit-Yan Yeung · Tie-Yan Liu · Jianxing Liao -
2022 Spotlight: Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret »
Jiawei Huang · Li Zhao · Tao Qin · Wei Chen · Nan Jiang · Tie-Yan Liu -
2022 Spotlight: Does Momentum Change the Implicit Regularization on Separable Data? »
Bohan Wang · Qi Meng · Huishuai Zhang · Ruoyu Sun · Wei Chen · Zhi-Ming Ma · Tie-Yan Liu -
2022 Spotlight: BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis »
Yichong Leng · Zehua Chen · Junliang Guo · Haohe Liu · Jiawei Chen · Xu Tan · Danilo Mandic · Lei He · Xiangyang Li · Tao Qin · sheng zhao · Tie-Yan Liu -
2022 Spotlight: Lightning Talks 4A-1 »
Jiawei Huang · Su Jia · Abdurakhmon Sadiev · Ruomin Huang · Yuanyu Wan · Denizalp Goktas · Jiechao Guan · Andrew Li · Wei-Wei Tu · Li Zhao · Amy Greenwald · Jiawei Huang · Dmitry Kovalev · Yong Liu · Wenjie Liu · Peter Richtarik · Lijun Zhang · Zhiwu Lu · R Ravi · Tao Qin · Wei Chen · Hu Ding · Nan Jiang · Tie-Yan Liu -
2022 Poster: Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning »
Hua Wei · Jingxiao Chen · Xiyang Ji · Hongyang Qin · Minwen Deng · Siqin Li · Liang Wang · Weinan Zhang · Yong Yu · Liu Linc · Lanxiao Huang · Deheng Ye · Qiang Fu · Wei Yang -
2022 Poster: Reinforcement Learning with Automated Auxiliary Loss Search »
Tairan He · Yuge Zhang · Kan Ren · Minghuan Liu · Che Wang · Weinan Zhang · Yuqing Yang · Dongsheng Li -
2022 Poster: Does Momentum Change the Implicit Regularization on Separable Data? »
Bohan Wang · Qi Meng · Huishuai Zhang · Ruoyu Sun · Wei Chen · Zhi-Ming Ma · Tie-Yan Liu -
2022 Poster: Your Transformer May Not be as Powerful as You Expect »
Shengjie Luo · Shanda Li · Shuxin Zheng · Tie-Yan Liu · Liwei Wang · Di He -
2022 Poster: BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis »
Yichong Leng · Zehua Chen · Junliang Guo · Haohe Liu · Jiawei Chen · Xu Tan · Danilo Mandic · Lei He · Xiangyang Li · Tao Qin · sheng zhao · Tie-Yan Liu -
2022 Poster: Bootstrapped Transformer for Offline Reinforcement Learning »
Kerong Wang · Hanye Zhao · Xufang Luo · Kan Ren · Weinan Zhang · Dongsheng Li -
2022 Poster: PerfectDou: Dominating DouDizhu with Perfect Information Distillation »
Guan Yang · Minghuan Liu · Weijun Hong · Weinan Zhang · Fei Fang · Guangjun Zeng · Yue Lin -
2022 Poster: NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning »
Rong-Jun Qin · Xingyuan Zhang · Songyi Gao · Xiong-Hui Chen · Zewen Li · Weinan Zhang · Yang Yu -
2022 Poster: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem »
Muning Wen · Jakub Kuba · Runji Lin · Weinan Zhang · Ying Wen · Jun Wang · Yaodong Yang -
2022 Poster: Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation »
Botao Yu · Peiling Lu · Rui Wang · Wei Hu · Xu Tan · Wei Ye · Shikun Zhang · Tao Qin · Tie-Yan Liu -
2021 : AI X Science »
Tie-Yan Liu -
2021 Poster: On the Generative Utility of Cyclic Conditionals »
Chang Liu · Haoyue Tang · Tao Qin · Jintao Wang · Tie-Yan Liu -
2021 Poster: Speech-T: Transducer for Text to Speech and Beyond »
Jiawei Chen · Xu Tan · Yichong Leng · Jin Xu · Guihua Wen · Tao Qin · Tie-Yan Liu -
2021 Poster: Stylized Dialogue Generation with Multi-Pass Dual Learning »
Jinpeng Li · Yingce Xia · Rui Yan · Hongda Sun · Dongyan Zhao · Tie-Yan Liu -
2021 Poster: Distributional Reinforcement Learning for Multi-Dimensional Reward Functions »
Pushi Zhang · Xiaoyu Chen · Li Zhao · Wei Xiong · Tao Qin · Tie-Yan Liu -
2021 Poster: Optimizing Information-theoretical Generalization Bound via Anisotropic Noise of SGLD »
Bohan Wang · Huishuai Zhang · Jieyu Zhang · Qi Meng · Wei Chen · Tie-Yan Liu -
2021 Poster: Co-evolution Transformer for Protein Contact Prediction »
He Zhang · Fusong Ju · Jianwei Zhu · Liang He · Bin Shao · Nanning Zheng · Tie-Yan Liu -
2021 Poster: On Effective Scheduling of Model-based Reinforcement Learning »
Hang Lai · Jian Shen · Weinan Zhang · Yimin Huang · Xing Zhang · Ruiming Tang · Yong Yu · Zhenguo Li -
2021 Poster: Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding »
Shengjie Luo · Shanda Li · Tianle Cai · Di He · Dinglan Peng · Shuxin Zheng · Guolin Ke · Liwei Wang · Tie-Yan Liu -
2021 Poster: Learning Causal Semantic Representation for Out-of-Distribution Prediction »
Chang Liu · Xinwei Sun · Jindong Wang · Haoyue Tang · Tao Li · Tao Qin · Wei Chen · Tie-Yan Liu -
2021 Poster: Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning »
Jongjin Park · Younggyo Seo · Chang Liu · Li Zhao · Tao Qin · Jinwoo Shin · Tie-Yan Liu -
2021 Poster: FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition »
Yichong Leng · Xu Tan · Linchen Zhu · Jin Xu · Renqian Luo · Linquan Liu · Tao Qin · Xiangyang Li · Edward Lin · Tie-Yan Liu -
2021 Poster: Do Transformers Really Perform Badly for Graph Representation? »
Chengxuan Ying · Tianle Cai · Shengjie Luo · Shuxin Zheng · Guolin Ke · Di He · Yanming Shen · Tie-Yan Liu -
2021 Poster: R-Drop: Regularized Dropout for Neural Networks »
xiaobo liang · Lijun Wu · Juntao Li · Yue Wang · Qi Meng · Tao Qin · Wei Chen · Min Zhang · Tie-Yan Liu -
2021 Poster: Recovering Latent Causal Factor for Generalization to Distributional Shifts »
Xinwei Sun · Botong Wu · Xiangyu Zheng · Chang Liu · Wei Chen · Tao Qin · Tie-Yan Liu -
2020 Poster: Efficient Projection-free Algorithms for Saddle Point Problems »
Cheng Chen · Luo Luo · Weinan Zhang · Yong Yu -
2020 Poster: RD$^2$: Reward Decomposition with Representation Decomposition »
Zichuan Lin · Derek Yang · Li Zhao · Tao Qin · Guangwen Yang · Tie-Yan Liu -
2020 Poster: Model-based Policy Optimization with Unsupervised Model Adaptation »
Jian Shen · Han Zhao · Weinan Zhang · Yong Yu -
2020 Spotlight: Model-based Policy Optimization with Unsupervised Model Adaptation »
Jian Shen · Han Zhao · Weinan Zhang · Yong Yu -
2019 Poster: Neural Machine Translation with Soft Prototype »
Yiren Wang · Yingce Xia · Fei Tian · Fei Gao · Tao Qin · Cheng Xiang Zhai · Tie-Yan Liu -
2019 Poster: FastSpeech: Fast, Robust and Controllable Text to Speech »
Yi Ren · Yangjun Ruan · Xu Tan · Tao Qin · Sheng Zhao · Zhou Zhao · Tie-Yan Liu -
2019 Poster: Fully Parameterized Quantile Function for Distributional Reinforcement Learning »
Derek Yang · Li Zhao · Zichuan Lin · Tao Qin · Jiang Bian · Tie-Yan Liu -
2019 Poster: Distributional Reward Decomposition for Reinforcement Learning »
Zichuan Lin · Li Zhao · Derek Yang · Tao Qin · Tie-Yan Liu · Guangwen Yang -
2019 Poster: Normalization Helps Training of Quantized LSTM »
Lu Hou · Jinhua Zhu · James Kwok · Fei Gao · Tao Qin · Tie-Yan Liu -
2017 Demonstration: MAgent: A Many-Agent Reinforcement Learning Research Platform for Artificial Collective Intelligence »
Lianmin Zheng · Jiacheng Yang · Han Cai · Weinan Zhang · Jun Wang · Yong Yu -
2017 Poster: Finite sample analysis of the GTD Policy Evaluation Algorithms in Markov Setting »
Yue Wang · Wei Chen · Yuting Liu · Zhi-Ming Ma · Tie-Yan Liu -
2017 Poster: Deliberation Networks: Sequence Generation Beyond One-Pass Decoding »
Yingce Xia · Fei Tian · Lijun Wu · Jianxin Lin · Tao Qin · Nenghai Yu · Tie-Yan Liu -
2017 Poster: LightGBM: A Highly Efficient Gradient Boosting Decision Tree »
Guolin Ke · Qi Meng · Thomas Finley · Taifeng Wang · Wei Chen · Weidong Ma · Qiwei Ye · Tie-Yan Liu -
2013 Poster: Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising »
Min Xu · Tao Qin · Tie-Yan Liu