Skip to yearly menu bar Skip to main content

Workshop: Goal-Conditioned Reinforcement Learning

Goal Misgeneralization as Implicit Goal Conditioning

Diego Dorn · Neel Alex · David Krueger

Keywords: [ goal misgeneralization ] [ Reinforcement Learning ]


While many examples of goal misspecification have been dissected in the reinforcement learning literature, few works have focused on the relatively new goal misgeneralization. As goal misgeneralization often stems from underspecification, we explore a simple environment with some goals specifiable through explicit conditioning, and others not. We find that agents generally pursue a mixture of possible goals, and the choice of goal to pursue is often inexplicable. Nonetheless, we attempt an explanation of implicit goal conditioning -- wherein subtle environment features determine which goal is pursued -- and aim to understand which features induce pursuit of one goal over another.

Chat is not available.