From Emergence to Intention: A Statistical Inductive Bias for Tractable Optimization in Multi-Agent Coordination
Abstract
Cooperative multi-agent reinforcement learning (MARL) presents a formidable non-convex optimization challenge, exacerbated by the non-stationarity introduced by concurrently learning agents. A common approach is to enable communication, allowing agents to coordinate. In this paper, we investigate how the structure of this communication impacts the tractability of the underlying optimization problem. We first analyze an end-to-end learned protocol, Learned Direct Communication (LDC), where messages emerge as part of the policy optimization. We then propose \textbf{Intention Communication}, an alternative that structures the optimization by integrating a learned statistical world model. Agents use this model to generate predictive trajectories and then communicate a compressed summary of this intent. This approach injects a strong inductive bias, reframing the optimization problem from discovering a communication protocol from scratch to learning to interpret explicit plans. We evaluate these methods on a cooperative task under partial observability. Our results demonstrate that while emergent communication is viable in simple settings, its performance degrades with scale. In contrast, by structuring the problem with a predictive statistical model, Intention Communication dramatically improves optimization stability and sample efficiency, achieving near-optimal performance in complex environments where other methods fail. This work underscores the efficacy of bridging statistical modeling with optimization to solve complex coordination problems.