Skip to yearly menu bar Skip to main content

Workshop: Goal-Conditioned Reinforcement Learning

Goal-Conditioned Predictive Coding for Offline Reinforcement Learning

Zilai Zeng · Ce Zhang · Shijie Wang · Chen Sun

Keywords: [ self-supervised representation learning ] [ sequence modeling ] [ offline reinforcement learning ]


Recent work has demonstrated the effectiveness of formulating decision making as a supervised learning problem on offline-collected trajectories. However, the benefits of performing sequence modeling on trajectory data are not yet clear. In this work, we investigate whether sequence modeling has the ability to condense trajectories into useful representations that enhance policy learning. To achieve this, we adopt a two-stage framework that first summarizes trajectories using sequence modeling techniques, and then leverages trajectory representations to learn a policy along with a desired goal. This design allows many existing supervised offline RL methods to be considered as specific instances of our framework. Within this framework, we introduce Goal-Conditioned Predicitve Coding (GCPC), an approach that brings powerful trajectory representations and leads to performant policies. Through extensive empirical evaluations on AntMaze, FrankaKitchen and Locomotion environments, we observe that sequence modeling has a significant impact on some decision making tasks. In addition, we demonstrate that GCPC learns a goal-conditioned latent representation about the future trajectory, which enables competitive performance on all three benchmarks.

Chat is not available.