Timezone: »

 
Poster
Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement
Benjamin Eysenbach · XINYANG GENG · Sergey Levine · Russ Salakhutdinov

Tue Dec 08 09:00 PM -- 11:00 PM (PST) @ Poster Session 2 #594

Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically pose the question: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal? Inverse RL answers this question. In this paper we show that inverse RL is a principled mechanism for reusing experience across tasks. We use this idea to generalize goal-relabeling techniques from prior work to arbitrary types of reward functions. Our experiments confirm that relabeling data using inverse RL outperforms prior relabeling methods on goal-reaching tasks, and accelerates learning on more general multi-task settings where prior methods are not applicable, such as domains with discrete sets of rewards and those with linear reward functions.

Author Information

Benjamin Eysenbach (Carnegie Mellon University)
Benjamin Eysenbach

Assistant professor at Princeton working on self-supervised reinforcement learning (scaling, algorithms, theory, and applications).

XINYANG GENG (UC Berkeley)
Sergey Levine (UC Berkeley)
Sergey Levine

Sergey Levine received a BS and MS in Computer Science from Stanford University in 2009, and a Ph.D. in Computer Science from Stanford University in 2014. He joined the faculty of the Department of Electrical Engineering and Computer Sciences at UC Berkeley in fall 2016. His work focuses on machine learning for decision making and control, with an emphasis on deep learning and reinforcement learning algorithms. Applications of his work include autonomous robots and vehicles, as well as applications in other decision-making domains. His research includes developing algorithms for end-to-end training of deep neural network policies that combine perception and control, scalable algorithms for inverse reinforcement learning, deep reinforcement learning algorithms, and more

Russ Salakhutdinov (Carnegie Mellon University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors