Timezone: »

Goal-conditioned Imitation Learning
Yiming Ding · Carlos Florensa · Pieter Abbeel · Mariano Phielipp

Thu Dec 12 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #229

Designing rewards for Reinforcement Learning (RL) is challenging because it needs to convey the desired task, be efficient to optimize, and be easy to compute. The latter is particularly problematic when applying RL to robotics, where detecting whether the desired configuration is reached might require considerable supervision and instrumentation. Furthermore, we are often interested in being able to reach a wide range of configurations, hence setting up a different reward every time might be unpractical. Methods like Hindsight Experience Replay (HER) have recently shown promise to learn policies able to reach many goals, without the need of a reward. Unfortunately, without tricks like resetting to points along the trajectory, HER might require many samples to discover how to reach certain areas of the state-space. In this work we propose a novel algorithm goalGAIL, which incorporates demonstrations to drastically speed up the convergence to a policy able to reach any goal, surpassing the performance of an agent trained with other Imitation Learning algorithms. Furthermore, we show our method can also be used when the available expert trajectories do not contain the actions or when the expert is suboptimal, which makes it applicable when only kinesthetic, third person or noisy demonstration is available.

Author Information

Yiming Ding (University of California, Berkeley)
Carlos Florensa (UC Berkeley)
Pieter Abbeel (UC Berkeley & covariant.ai)

Pieter Abbeel is Professor and Director of the Robot Learning Lab at UC Berkeley [2008- ], Co-Director of the Berkeley AI Research (BAIR) Lab, Co-Founder of covariant.ai [2017- ], Co-Founder of Gradescope [2014- ], Advisor to OpenAI, Founding Faculty Partner AI@TheHouse venture fund, Advisor to many AI/Robotics start-ups. He works in machine learning and robotics. In particular his research focuses on making robots learn from people (apprenticeship learning), how to make robots learn through their own trial and error (reinforcement learning), and how to speed up skill acquisition through learning-to-learn (meta-learning). His robots have learned advanced helicopter aerobatics, knot-tying, basic assembly, organizing laundry, locomotion, and vision-based robotic manipulation. He has won numerous awards, including best paper awards at ICML, NIPS and ICRA, early career awards from NSF, Darpa, ONR, AFOSR, Sloan, TR35, IEEE, and the Presidential Early Career Award for Scientists and Engineers (PECASE). Pieter's work is frequently featured in the popular press, including New York Times, BBC, Bloomberg, Wall Street Journal, Wired, Forbes, Tech Review, NPR.

Mariano Phielipp (Intel AI Labs)

Dr. Mariano Phielipp works at the Intel AI Lab inside the Intel Artificial Intelligence Products Group. His work includes research and development in deep learning, deep reinforcement learning, machine learning, and artificial intelligence. Since joining Intel, Dr. Phielipp has developed and worked on Computer Vision, Face Recognition, Face Detection, Object Categorization, Recommendation Systems, Online Learning, Automatic Rule Learning, Natural Language Processing, Knowledge Representation, Energy Based Algorithms, and other Machine Learning and AI-related efforts. Dr. Phielipp has also contributed to different disclosure committees, won an Intel division award related to Robotics, and has a large number of patents and pending patents. He has published on NeuriPS, ICML, ICLR, AAAI, IROS, IEEE, SPIE, IASTED, and EUROGRAPHICS-IEEE Conferences and Workshops.

More from the Same Authors