A common trope in sci-fi is to have a robot that can quickly solve some problem after watching a person, studying a video, or reading a book. While these settings are (currently) fictional, the benefits are real. Agents that can solve tasks by observing others have the potential to greatly reduce the burden of their human teachers, removing some of the need to hand-specify rewards or goals. In this talk, I consider the question of how an agent can not only learn by observing others, but also how it can learn quickly by training offline before taking any steps in the environment. First, I will describe an approach that trains a latent policy directly from state observations, which can then be quickly mapped to real actions in the agent’s environment. Then I will describe how we can train a novel value function, Q(s,s’), to learn off-policy from observations. Unlike previous imitation from observation approaches, this formulation goes beyond simply imitating and rather enables learning from potentially suboptimal observations.
Ashley Edwards (Uber AI Labs)
More from the Same Authors
2020 : Panel discussion »
Pierre-Yves Oudeyer · Marc Bellemare · Peter Stone · Matt Botvinick · Susan Murphy · Anusha Nagabandi · Ashley Edwards · Karen Liu · Pieter Abbeel