NeurIPS Efficient Value Propagation with the Compositional Optimality Equation

Poster
in
Workshop: Goal-Conditioned Reinforcement Learning

Efficient Value Propagation with the Compositional Optimality Equation

Piotr Piękos · Aditya Ramesh · Francesco Faccio · Jürgen Schmidhuber

Keywords: [ sample efficiency ] [ Reinforcement Learning ] [ goal-conditioned reinforcement learning ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Goal-Conditioned Reinforcement Learning (GCRL) is about learning to reach predefined goal states. GCRL in the real world is crucial for adaptive robotics. Existing GCRL methods, however, suffer from low sample efficiency and high cost of collecting real-world data. Here we introduce the Compositional Optimality Equation (COE) for a widely used class of deterministic environments in which the reward is obtained only upon reaching a goal state. COE represents a novel alternative to the standard Bellman Optimality Equation, leading to more sample-efficient update rules. The Bellman update combines the immediate reward and the bootstrapped estimate of the best next state. Our COE-based update rule, however, combines the best composition of two bootstrapped estimates reflecting an arbitrary intermediate subgoal state. In tabular settings, the new update rule guarantees convergence to the optimal value function exponentially faster than the Bellman update! COE can also be used to derive compositional variants of conventional (deep) RL. In particular, our COE-based version of DDPG is more sample-efficient than DDPG in a continuous grid world.

Chat is not available.

Poster in Workshop: Goal-Conditioned Reinforcement Learning

Efficient Value Propagation with the Compositional Optimality Equation

Piotr Piękos · Aditya Ramesh · Francesco Faccio · Jürgen Schmidhuber

Poster
in
Workshop: Goal-Conditioned Reinforcement Learning