Skip to yearly menu bar Skip to main content

Workshop: Deep Reinforcement Learning

C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks

Tianjun Zhang · Ben Eysenbach · Russ Salakhutdinov · Sergey Levine · Joseph Gonzalez


Goal-conditioned reinforcement learning (RL) has shown great success recently at solving a wide range of tasks(e.g., navigation, robotic manipulation). However, learning to reach distant goals remains a central challenge to the field, and the task is particularly hard without any expert demonstrations and reward shaping. In this paper, we propose to solve the distant goal-reaching task by using search at training time to generate a curriculum of intermediate states. Specifically, we introduce the algorithm Classifier Planning (C-Planning) by framing the learning of the goal-conditioned policies as variational inference. C-Planningnaturally follows expectation maximization (EM): the E step corresponds to planning an optimal sequence of waypoints using graph search, while the M step corresponds to learning a goal-conditioned policy to reach those waypoints. One essential difficulty of designing such an algorithm is accurately modeling the distribution over waypoints to sample from. In C-Planning, we propose to sample the waypoints using contrastive methods to learn a value function. Unlike prior methods that combine goal-conditioned RL with graph search, ours performs search only during training and not testing, significantly decreasing the compute costs of deploying the learned policy. Empirically, we demonstrate that our method not only improves the sample efficiency of prior methods but also successfully solves temporally-extended navigation and manipulation tasks, where prior goal-conditioned RL methods (including those based on graph search) fail to solve.

Chat is not available.