Timezone: »

 
Poster
Unsupervised Reinforcement Learning with Contrastive Intrinsic Control
Michael Laskin · Hao Liu · Xue Bin Peng · Denis Yarats · Aravind Rajeswaran · Pieter Abbeel

Tue Nov 29 02:00 PM -- 04:00 PM (PST) @ Hall J #911

We introduce Contrastive Intrinsic Control (CIC), an unsupervised reinforcement learning (RL) algorithm that maximizes the mutual information between state-transitions and latent skill vectors. CIC utilizes contrastive learning between state-transitions and skills vectors to learn behaviour embeddings and maximizes the entropy of these embeddings as an intrinsic reward to encourage behavioural diversity. We evaluate our algorithm on the Unsupervised RL Benchmark (URLB) in the asymptotic state-based setting, which consists of a long reward-free pre-training phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. We find that CIC improves over prior exploration algorithms in terms of adaptation efficiency to downstream tasks on state-based URLB.

Author Information

Michael Laskin (DeepMind)
Hao Liu (University of California Berkeley)
Xue Bin Peng (University of California, Berkeley)
Denis Yarats (New York University)
Aravind Rajeswaran (FAIR)
Pieter Abbeel (UC Berkeley & Covariant)

Pieter Abbeel is Professor and Director of the Robot Learning Lab at UC Berkeley [2008- ], Co-Director of the Berkeley AI Research (BAIR) Lab, Co-Founder of covariant.ai [2017- ], Co-Founder of Gradescope [2014- ], Advisor to OpenAI, Founding Faculty Partner AI@TheHouse venture fund, Advisor to many AI/Robotics start-ups. He works in machine learning and robotics. In particular his research focuses on making robots learn from people (apprenticeship learning), how to make robots learn through their own trial and error (reinforcement learning), and how to speed up skill acquisition through learning-to-learn (meta-learning). His robots have learned advanced helicopter aerobatics, knot-tying, basic assembly, organizing laundry, locomotion, and vision-based robotic manipulation. He has won numerous awards, including best paper awards at ICML, NIPS and ICRA, early career awards from NSF, Darpa, ONR, AFOSR, Sloan, TR35, IEEE, and the Presidential Early Career Award for Scientists and Engineers (PECASE). Pieter's work is frequently featured in the popular press, including New York Times, BBC, Bloomberg, Wall Street Journal, Wired, Forbes, Tech Review, NPR.

More from the Same Authors