Federated Model-Based Offline Multi-Agent Reinforcement Learning for Wireless Networks
Abstract
Wireless networks are naturally modeled as cooperative multi-agent reinforcement learning problems: distributed entities act on partial observations under interference and non-stationary traffic while pursuing common network objectives. Online exploration is risky and sample-inefficient in live systems, yet large operational logs are available, motivating an offline MARL approach. We introduce FedMORL, a federated model-based offline framework that shares an environment model rather than policy parameters. Each client learns dynamics and reward predictors from its logs and periodically aggregates them into a shared world model, which is then used locally to improve policies without environment interaction through (i) planner-guided evaluation and (ii) short-horizon rollouts that augment on-support data. The design preserves decentralized execution and limits privacy exposure by avoiding raw-data and policy sharing. Applied to wireless network tasks, FedMORL improves average and tail throughput and reduces collisions and delay compared with rule-based baselines. We also outline conditions under which model federation is most beneficial—heterogeneous traffic and limited local coverage—supporting model federation as a practical, privacy-preserving, offline-first path for multi-agent wireless control.