Skip to yearly menu bar Skip to main content

Workshop: Causal Representation Learning

Learning Endogenous Representation in Reinforcement Learning via Advantage Estimation

Hsiao-Ru Pan · Bernhard Schölkopf

Keywords: [ exomdp ] [ Reinforcement Learning ] [ causal effect ]


Recently, it was shown that the advantage function can be understood as quantifying the causal effect of an action on the cumulative reward. However, this connection remained largely analogical, with unclear implications. In the present work, we examine this analogy using the Exogenous Markov Decision Process (ExoMDP) framework, which factorizes an MDP into variables that are causally related to the agent's actions (endogenous) and variables that are beyond the agent's control (exogenous). We demonstrate that the advantage function can be expressed using only the endogenous variables, which is, in general, not possible for the (action-)value function. Through experiments in a toy ExoMDP, we found that estimating the advantage function directly can facilitate learning representations that are invariant to the exogenous variables.

Chat is not available.