Timezone: »
Unsupervised reinforcement learning (URL) poses a promising paradigm to learn useful behaviors in a task-agnostic environment without the guidance of extrinsic rewards to facilitate the fast adaptation of various downstream tasks. Previous works focused on the pre-training in a model-free manner while lacking the study of transition dynamics modeling that leaves a large space for the improvement of sample efficiency in downstream tasks. To this end, we propose an Efficient Unsupervised Reinforcement Learning Framework with Multi-choice Dynamics model (EUCLID), which introduces a novel model-fused paradigm to jointly pre-train the dynamics model and unsupervised exploration policy in the pre-training phase, thus better leveraging the environmental samples and improving the downstream task sampling efficiency. However, constructing a generalizable model which captures the local dynamics under different behaviors remains a challenging problem. We introduce the multi-choice dynamics model that covers different local dynamics under different behaviors concurrently, which uses different heads to learn the state transition under different behaviors during unsupervised pre-training and selects the most appropriate head for prediction in the downstream task. Experimental results in the manipulation and locomotion domains demonstrate that EUCLID achieves state-of-the-art performance with high sample efficiency, basically solving the state-based URLB benchmark and reaching a mean normalized score of 104.0±1.2% in downstream tasks with 100k fine-tuning steps, which is equivalent to DDPG’s performance at 2M interactive steps with 20× more data. Codes and visualization videos are released on our homepage.
Author Information
Yifu Yuan (Tianjin University)
Jianye Hao (Tianjin University)
Fei Ni (Tianjin University)
Yao Mu (The University of Hong Kong)
I am currently a Ph.D. Candidate of Computer Science at the University of Hong Kong. I graduated with a Master Degree from Tsinghua University in June 2021. My research interests include Reinforcement Learning, Representation Learning, Autonomous Driving, Optimal Control, and Computer Vision.
YAN ZHENG (Tianjin University)
Yujing Hu (NetEase, Inc.)
Jinyi Liu (Tianjin University)
Yingfeng Chen (Netease Fuxi AI LAB)
Changjie Fan (NetEase Fuxi AI Lab)
More from the Same Authors
-
2021 : OVD-Explorer: A General Information-theoretic Exploration Approach for Reinforcement Learning »
Jinyi Liu · Zhi Wang · YAN ZHENG · Jianye Hao · Junjie Ye · Chenjia Bai · Pengyi Li -
2021 : HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via Hybrid Action Representation »
Boyan Li · Hongyao Tang · YAN ZHENG · Jianye Hao · Pengyi Li · Zhaopeng Meng · LI Wang -
2021 : PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration »
Pengyi Li · Hongyao Tang · Tianpei Yang · Xiaotian Hao · Sang Tong · YAN ZHENG · Jianye Hao · Matthew Taylor · Jinyi Liu -
2022 Poster: Multiagent Q-learning with Sub-Team Coordination »
Wenhan Huang · Kai Li · Kun Shao · Tianze Zhou · Matthew Taylor · Jun Luo · Dongge Wang · Hangyu Mao · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Poster: Plan To Predict: Learning an Uncertainty-Foreseeing Model For Model-Based Reinforcement Learning »
Zifan Wu · Chao Yu · Chen Chen · Jianye Hao · Hankz Hankui Zhuo -
2022 Poster: Transformer-based Working Memory for Multiagent Reinforcement Learning with Action Parsing »
Yaodong Yang · Guangyong Chen · Weixun Wang · Xiaotian Hao · Jianye Hao · Pheng-Ann Heng -
2022 Poster: Versatile Multi-stage Graph Neural Network for Circuit Representation »
shuwen yang · Zhihao Yang · Dong Li · Yingxueff Zhang · Zhanguang Zhang · Guojie Song · Jianye Hao -
2022 : Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes »
Min Zhang · Hongyao Tang · Jianye Hao · YAN ZHENG -
2022 : ERL-Re$^2$: Efficient Evolutionary Reinforcement Learning with Shared State Representation and Individual Policy Representation »
Pengyi Li · Hongyao Tang · Jianye Hao · YAN ZHENG · Xian Fu · Zhaopeng Meng -
2022 : SEM2: Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model »
Zeyu Gao · Yao Mu · Ruoyan Shen · Chen Chen · Yangang Ren · Jianyu Chen · Shengbo Li · Ping Luo · Yanfeng Lu -
2022 : Model and Method: Training-Time Attack for Cooperative Multi-Agent Reinforcement Learning »
Siyang Wu · Tonghan Wang · Xiaoran Wu · Jingfeng ZHANG · Yujing Hu · Changjie Fan · Chongjie Zhang -
2022 : Planning Immediate Landmarks of Targets for Model-Free Skill Transfer across Agents »
Minghuan Liu · Zhengbang Zhu · Menghui Zhu · Yuzheng Zhuang · Weinan Zhang · Jianye Hao -
2022 Spotlight: Lightning Talks 6B-3 »
Lingfeng Yang · Yao Lai · Zizheng Pan · Zhenyu Wang · Weicong Liang · Chuanyang Zheng · Jian-Wei Zhang · Peng Jin · Jing Liu · Xiuying Wei · Yao Mu · Xiang Li · YUHUI YUAN · Zizheng Pan · Yifan Sun · Yunchen Zhang · Jianfei Cai · Hao Luo · zheyang li · Jinfa Huang · Haoyu He · Yi Yang · Ping Luo · Fenglin Liu · Henghui Ding · Borui Zhao · Xiangguo Zhang · Kai Zhang · Pichao WANG · Bohan Zhuang · Wei Chen · Ruihao Gong · Zhi Yang · Xian Wu · Feng Ding · Jianfei Cai · Xiao Luo · Renjie Song · Weihong Lin · Jian Yang · Wenming Tan · Bohan Zhuang · Shanghang Zhang · Shen Ge · Fan Wang · Qi Zhang · Guoli Song · Jun Xiao · Hao Li · Ding Jia · David Clifton · Ye Ren · Fengwei Yu · Zheng Zhang · Jie Chen · Shiliang Pu · Xianglong Liu · Chao Zhang · Han Hu -
2022 Spotlight: MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning »
Yao Lai · Yao Mu · Ping Luo -
2022 Spotlight: Lightning Talks 5A-3 »
Minting Pan · Xiang Chen · Wenhan Huang · Can Chang · Zhecheng Yuan · Jianzhun Shao · Yushi Cao · Peihao Chen · Ke Xue · Zhengrong Xue · Zhiqiang Lou · Xiangming Zhu · Lei Li · Zhiming Li · Kai Li · Jiacheng Xu · Dongyu Ji · Ni Mu · Kun Shao · Tianpei Yang · Kunyang Lin · Ningyu Zhang · Yunbo Wang · Lei Yuan · Bo Yuan · Hongchang Zhang · Jiajun Wu · Tianze Zhou · Xueqian Wang · Ling Pan · Yuhang Jiang · Xiaokang Yang · Xiaozhuan Liang · Hao Zhang · Weiwen Hu · Miqing Li · YAN ZHENG · Matthew Taylor · Huazhe Xu · Shumin Deng · Chao Qian · YI WU · Shuncheng He · Wenbing Huang · Chuanqi Tan · Zongzhang Zhang · Yang Gao · Jun Luo · Yi Li · Xiangyang Ji · Thomas Li · Mingkui Tan · Fei Huang · Yang Yu · Huazhe Xu · Dongge Wang · Jianye Hao · Chuang Gan · Yang Liu · Luo Si · Hangyu Mao · Huajun Chen · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Spotlight: Plan To Predict: Learning an Uncertainty-Foreseeing Model For Model-Based Reinforcement Learning »
Zifan Wu · Chao Yu · Chen Chen · Jianye Hao · Hankz Hankui Zhuo -
2022 Spotlight: DOMINO: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning »
Yao Mu · Yuzheng Zhuang · Fei Ni · Bin Wang · Jianyu Chen · Jianye Hao · Ping Luo -
2022 Spotlight: GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis »
Yushi Cao · Zhiming Li · Tianpei Yang · Hao Zhang · YAN ZHENG · Yi Li · Jianye Hao · Yang Liu -
2022 Spotlight: Multiagent Q-learning with Sub-Team Coordination »
Wenhan Huang · Kai Li · Kun Shao · Tianze Zhou · Matthew Taylor · Jun Luo · Dongge Wang · Hangyu Mao · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Spotlight: Lightning Talks 5A-1 »
Yao Mu · Jin Zhang · Haoyi Niu · Rui Yang · Mingdong Wu · Ze Gong · shubham sharma · Chenjia Bai · Yu ("Tony") Zhang · Siyuan Li · Yuzheng Zhuang · Fangwei Zhong · Yiwen Qiu · Xiaoteng Ma · Fei Ni · Yulong Xia · Chongjie Zhang · Hao Dong · Ming Li · Zhaoran Wang · Bin Wang · Chongjie Zhang · Jianyu Chen · Guyue Zhou · Lei Han · Jianming HU · Jianye Hao · Xianyuan Zhan · Ping Luo -
2022 Spotlight: Lightning Talks 3A-2 »
shuwen yang · Xu Zhang · Delvin Ce Zhang · Lan-Zhe Guo · Renzhe Xu · Zhuoer Xu · Yao-Xiang Ding · Weihan Li · Xingxuan Zhang · Xi-Zhu Wu · Zhenyuan Yuan · Hady Lauw · Yu Qi · Yi-Ge Zhang · Zhihao Yang · Guanghui Zhu · Dong Li · Changhua Meng · Kun Zhou · Gang Pan · Zhi-Fan Wu · Bo Li · Minghui Zhu · Zhi-Hua Zhou · Yafeng Zhang · Yingxueff Zhang · shiwen cui · Jie-Jing Shao · Zhanguang Zhang · Zhenzhe Ying · Xiaolong Chen · Yu-Feng Li · Guojie Song · Peng Cui · Weiqiang Wang · Ming GU · Jianye Hao · Yihua Huang -
2022 Spotlight: Versatile Multi-stage Graph Neural Network for Circuit Representation »
shuwen yang · Zhihao Yang · Dong Li · Yingxueff Zhang · Zhanguang Zhang · Guojie Song · Jianye Hao -
2022 Poster: GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis »
Yushi Cao · Zhiming Li · Tianpei Yang · Hao Zhang · YAN ZHENG · Yi Li · Jianye Hao · Yang Liu -
2022 Poster: DOMINO: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning »
Yao Mu · Yuzheng Zhuang · Fei Ni · Bin Wang · Jianyu Chen · Jianye Hao · Ping Luo -
2022 Poster: The Policy-gradient Placement and Generative Routing Neural Networks for Chip Design »
Ruoyu Cheng · Xianglong Lyu · Yang Li · Junjie Ye · Jianye Hao · Junchi Yan -
2022 Poster: MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning »
Yao Lai · Yao Mu · Ping Luo -
2021 : HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via Hybrid Action Representation Q&A »
Boyan Li · Hongyao Tang · YAN ZHENG · Jianye Hao · Pengyi Li · Zhaopeng Meng · LI Wang -
2021 : HyAR: Addressing Discrete-Continuous Action Reinforcement Learning via Hybrid Action Representation »
Boyan Li · Hongyao Tang · YAN ZHENG · Jianye Hao · Pengyi Li · Zhaopeng Meng · LI Wang -
2021 Poster: Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games »
Xiangyu Liu · Hangtian Jia · Ying Wen · Yujing Hu · Yingfeng Chen · Changjie Fan · ZHIPENG HU · Yaodong Yang -
2021 Poster: Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration »
Lulu Zheng · Jiarui Chen · Jianhao Wang · Jiamin He · Yujing Hu · Yingfeng Chen · Changjie Fan · Yang Gao · Chongjie Zhang -
2021 Poster: Model-Based Reinforcement Learning via Imagination with Derived Memory »
Yao Mu · Yuzheng Zhuang · Bin Wang · Guangxiang Zhu · Wulong Liu · Jianyu Chen · Ping Luo · Shengbo Li · Chongjie Zhang · Jianye Hao -
2021 Poster: An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning »
Tianpei Yang · Weixun Wang · Hongyao Tang · Jianye Hao · Zhaopeng Meng · Hangyu Mao · Dong Li · Wulong Liu · Yingfeng Chen · Yujing Hu · Changjie Fan · Chengwei Zhang -
2020 Poster: Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping »
Yujing Hu · Weixun Wang · Hangtian Jia · Yixiang Wang · Yingfeng Chen · Jianye Hao · Feng Wu · Changjie Fan -
2018 Poster: A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents »
YAN ZHENG · Zhaopeng Meng · Jianye Hao · Zongzhang Zhang · Tianpei Yang · Changjie Fan