Last-Iterate Guarantees in Noisy Games via BNN Dynamics
Abstract
Learning in games with noisy feedback is a fundamental problem in multi-agent systems, where agents typically rely on sampled estimates of payoffs rather than exact gradients. We establish a theoretical framework for Brown–von Neumann–Nash dynamics in two-player zero-sum normal-form games, and extend the analysis towards extensive-form games, where sampling is unavoidable. Our results show that the induced stochastic approximation recursion guarantees last-iterate convergence to a neighbourhood of Nash equilibria, quantifies the bias-induced drift caused by stochastic noise, and derives convergence rates. Building on this theory, we instantiate a sample-based actor–critic algorithm that directly implements BNN dynamics within a policy-gradient architecture. Experiments in Kuhn Poker and Leduc Poker validate the theoretical predictions, highlighting both the effectiveness of the approach and the role of critic step sizes in shaping bias and final accuracy.