Timezone: »

EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model
Yifu Yuan · Jianye Hao · Fei Ni · Yao Mu · YAN ZHENG · Yujing Hu · Jinyi Liu · Yingfeng Chen · Changjie Fan
Event URL: https://openreview.net/forum?id=9-tjK93-rP »

Unsupervised reinforcement learning (URL) poses a promising paradigm to learn useful behaviors in a task-agnostic environment without the guidance of extrinsic rewards to facilitate the fast adaptation of various downstream tasks. Previous works focused on the pre-training in a model-free manner while lacking the study of transition dynamics modeling that leaves a large space for the improvement of sample efficiency in downstream tasks. To this end, we propose an Efficient Unsupervised Reinforcement Learning Framework with Multi-choice Dynamics model (EUCLID), which introduces a novel model-fused paradigm to jointly pre-train the dynamics model and unsupervised exploration policy in the pre-training phase, thus better leveraging the environmental samples and improving the downstream task sampling efficiency. However, constructing a generalizable model which captures the local dynamics under different behaviors remains a challenging problem. We introduce the multi-choice dynamics model that covers different local dynamics under different behaviors concurrently, which uses different heads to learn the state transition under different behaviors during unsupervised pre-training and selects the most appropriate head for prediction in the downstream task. Experimental results in the manipulation and locomotion domains demonstrate that EUCLID achieves state-of-the-art performance with high sample efficiency, basically solving the state-based URLB benchmark and reaching a mean normalized score of 104.0±1.2% in downstream tasks with 100k fine-tuning steps, which is equivalent to DDPG’s performance at 2M interactive steps with 20× more data. Codes and visualization videos are released on our homepage.

Author Information

Yifu Yuan (Tianjin University)
Jianye Hao (Tianjin University)
Fei Ni (Tianjin University)
Yao Mu (The University of Hong Kong)

I am currently a Ph.D. Candidate of Computer Science at the University of Hong Kong. I graduated with a Master Degree from Tsinghua University in June 2021. My research interests include Reinforcement Learning, Representation Learning, Autonomous Driving, Optimal Control, and Computer Vision.

YAN ZHENG (Tianjin University)
Yujing Hu (NetEase, Inc.)
Jinyi Liu (Tianjin University)
Yingfeng Chen (Netease Fuxi AI LAB)
Changjie Fan (NetEase Fuxi AI Lab)

More from the Same Authors