Spotlight
Evolved Policy Gradients
Rein Houthooft · Yuhua Chen · Phillip Isola · Bradly Stadie · Filip Wolski · OpenAI Jonathan Ho · Pieter Abbeel

Tue Dec 4th 03:30 -- 03:35 PM @ Room 220 E

We propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve high rewards. The loss is parametrized via temporal convolutions over the agent's experience. Because this loss is highly flexible in its ability to take into account the agent's history, it enables fast task learning. Empirical results show that our evolved policy gradient algorithm (EPG) achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method. We also demonstrate that EPG's learned loss can generalize to out-of-distribution test time tasks, and exhibits qualitatively different behavior from other popular metalearning algorithms.

Author Information

Rein Houthooft (Happy Elements)
Richard Chen (Happy Elements Inc.)
Phillip Isola (OpenAI)
Bradly Stadie (UC Berkeley)
Filip Wolski (OpenAI)
OpenAI Jonathan Ho (OpenAI, UC Berkeley)
Pieter Abbeel (UC Berkeley | Gradescope | Covariant)

Pieter Abbeel is Professor and Director of the Robot Learning Lab at UC Berkeley [2008- ], Co-Director of the Berkeley AI Research (BAIR) Lab, Co-Founder of covariant.ai [2017- ], Co-Founder of Gradescope [2014- ], Advisor to OpenAI, Founding Faculty Partner AI@TheHouse venture fund, Advisor to many AI/Robotics start-ups. He works in machine learning and robotics. In particular his research focuses on making robots learn from people (apprenticeship learning), how to make robots learn through their own trial and error (reinforcement learning), and how to speed up skill acquisition through learning-to-learn (meta-learning). His robots have learned advanced helicopter aerobatics, knot-tying, basic assembly, organizing laundry, locomotion, and vision-based robotic manipulation. He has won numerous awards, including best paper awards at ICML, NIPS and ICRA, early career awards from NSF, Darpa, ONR, AFOSR, Sloan, TR35, IEEE, and the Presidential Early Career Award for Scientists and Engineers (PECASE). Pieter's work is frequently featured in the popular press, including New York Times, BBC, Bloomberg, Wall Street Journal, Wired, Forbes, Tech Review, NPR.

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors