Skip to yearly menu bar Skip to main content

Workshop: Goal-Conditioned Reinforcement Learning

Causality of GCRL: Return to No Future?

Ivana Malenica · Susan Murphy

Keywords: [ structural equation models ] [ Causal Inference ] [ goal-conditioned RL ]


Recent work has demonstrated remarkable effectiveness of formulating Reinforcement Learning (RL) objectives as supervised learning problems. The primary motivation of goal-conditioned RL (GCRL) is to learn actions which maximize the conditional probability of achieving the desired return. In order to accomplish this, GCRL strives to estimate the conditional probability of actions (A = a) given states (S = s) and future rewards (R= r), which can be expressed as P (a | s, r) = P (s, a, r)/P (s, r). Subsequently, the optimal action aims to maximize an estimate of P (a | s, r). One critical insight missing in both empirical and theoretical work on GCRL pertains to the causality of incorporating information about the future into the training process. Selection bias is a fundamental issue in achieving valid causal inference. It occurs when units in the population are preferentially included or, more broadly, when conditioning on a collider variable. When conditioned on, colliders introduce spurious associations between variables that share a common descendant. This can lead to an agent learning a biased policy, which is based on spurious associations. In this work, we make a first attempt at investigating an important question for safe and robust decision making: what are the causal limitations of GCRL algorithms, and do they result in learning a biased policy? We examine GCRL via experiments in a complete (all variables known and measured) and incomplete (unknown and unmeasured variables exist) graphical models.

Chat is not available.