Timezone: »

Teachable Reinforcement Learning via Advice Distillation
Olivia Watkins · Abhishek Gupta · Trevor Darrell · Pieter Abbeel · Jacob Andreas

Tue Dec 07 08:30 AM -- 10:00 AM (PST) @ None #None

Training automated agents to perform complex behaviors in interactive environments is challenging: reinforcement learning requires careful hand-engineering of reward functions, imitation learning requires specialized infrastructure and access to a human expert, and learning from intermediate forms of supervision (like binary preferences) is time-consuming and provides minimal information per human intervention. Can we overcome these challenges by building agents that learn from rich, interactive feedback? We propose a new supervision paradigm for interactive learning based on teachable decision-making systems, which learn from structured advice provided by an external teacher. We begin by introducing a class of human-in-the-loop decision making problems in which different forms of human provided advice signals are available to the agent to guide learning. We then describe a simple policy learning algorithm that first learns to interpret advice, then learns from advice to target tasks in the absence of human supervision. In puzzle-solving, navigation, and locomotion domains, we show that agents that learn from advice can acquire new skills with significantly less human supervision required than standard reinforcement or imitation learning systems.

Author Information

Olivia Watkins (UC Berkeley)

I'm currently exploring several areas of machine learning, including reinforcement learning, computer vision, and their applications to robotics.

Abhishek Gupta (University of California, Berkeley)
Trevor Darrell (Electrical Engineering & Computer Science Department)
Pieter Abbeel (UC Berkeley & Covariant)

Pieter Abbeel is Professor and Director of the Robot Learning Lab at UC Berkeley [2008- ], Co-Director of the Berkeley AI Research (BAIR) Lab, Co-Founder of covariant.ai [2017- ], Co-Founder of Gradescope [2014- ], Advisor to OpenAI, Founding Faculty Partner AI@TheHouse venture fund, Advisor to many AI/Robotics start-ups. He works in machine learning and robotics. In particular his research focuses on making robots learn from people (apprenticeship learning), how to make robots learn through their own trial and error (reinforcement learning), and how to speed up skill acquisition through learning-to-learn (meta-learning). His robots have learned advanced helicopter aerobatics, knot-tying, basic assembly, organizing laundry, locomotion, and vision-based robotic manipulation. He has won numerous awards, including best paper awards at ICML, NIPS and ICRA, early career awards from NSF, Darpa, ONR, AFOSR, Sloan, TR35, IEEE, and the Presidential Early Career Award for Scientists and Engineers (PECASE). Pieter's work is frequently featured in the popular press, including New York Times, BBC, Bloomberg, Wall Street Journal, Wired, Forbes, Tech Review, NPR.

Jacob Andreas (MIT)

More from the Same Authors