PolicyGRID: Acting to Understand, Understanding to Act
Abstract
Embodied agents require internal models that support interventional reasoning, not merely correlational prediction. We present PolicyGRID, an embodied world model that learns causal structure online through its own actions. Unlike traditional approaches that treat causal discovery as preprocessing, PolicyGRID integrates causal learning directly into the policy loop: agents actively probe the environment to resolve causal uncertainty while simultaneously optimizing for competing objectives. This enables agents to adapt their causal understanding as they act, expanding their behavioral repertoire beyond correlation-driven policies. The framework addresses a fundamental challenge in embodied AI: how can agents maintain reliable world models when their own interventions continuously change the data distribution? To validate this approach, we evaluate PolicyGRID in building control across synthetic simulations, public datasets, and real deployment, achieving F1 = 0.89 under real-world conditions and 2.8x higher policy performance than baselines, demonstrating that embedding causal reasoning directly into the policy loop yields more robust, adaptive behavior than correlation-driven world models.