Flow Equivariant World Models: Structured Dynamics Outside the Field of View
Abstract
Embodied systems experience the world as 'a symphony of flows': a combination of many continuous streams of sensory input entrained to self-motion and intertwined with the motion of external objects. These streams obey smooth, time-parameterized symmetries (e.g. translating or expanding optic flow), yet most neural network sequence models ignore this structure, and instead laboriously re-learn the same transformations from data. In this work, we introduce 'Flow Equivariant World Models', a framework in which both self-motion and the motion of external objects are unified as one-parameter Lie group 'flows' thereby enabling group equivariance with respect to these ubiquitous transformations. On a 2D partially observed world modeling benchmark, Flow Equivariant World Models learn with an order of magnitude fewer training iterations and consequently outperform a comparable state-of-the-art diffusion-based world-modeling architecture -- particularly when there are predictable world dynamics outside the agent's current field of view. The flow equivariant update rule also remains stable over hundreds of future rolled-out timesteps, generating a latent map robust to internal and external motion. Project page: https://flowm-anonymous.github.io/Flow-Equivariant-World-Models/