Skip to yearly menu bar Skip to main content

Workshop: 3rd Offline Reinforcement Learning Workshop: Offline RL as a "Launchpad"

Imitation from Observation With Bootstrapped Contrastive Learning

Medric Sonwa · Johanna Hansen · Eugene Belilovsky


Imitation from observation is a paradigm that consists of training agents using observations of expert demonstrations without direct access to the actions. Depending on the problem configuration, these demonstrations can be sequences of states or raw visual observations.One of the most common procedures adopted to solve this problem is to train a reward function from the demonstrations, but this task still remains a significant challenge.We approach this problem with a method of agent behavior representation in a latent space using demonstration videos.Our approach exploits recent algorithms of contrastive learning of image and video and uses a bootstrapping method to progressively train a trajectory encoding function with respect to the variation of the agent policy. This function is then used to compute the rewards provided to a standard Reinforcement Learning (RL) algorithm.Our method uses only a limited number of videos produced by an expert and we do not have access to the expert policy function.Our experiments show promising results on a set of continuous control tasks and demonstrate that learning a behavior encoder from videos allows building an efficient reward function for the agent.

Chat is not available.