Generalized Belief Learning in Multi-Agent Settings
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster

"Self-play" is a common method for constructing solutions in Markov games, where an agent takes control of every player during the learning process and iteratively improves the strategy of all sides. Self-play can yield optimal policies in collaborative settings, but these policies often adopt highly-specialized conventions that make playing with a novel partner difficult. To address this, other methods have explored encoding symmetry and convention-awareness into policy training, but this can complicate policy training and can even lead to convergence of sub-optimal policies. To overcome this, we propose moving the learning of conventions to the belief space, so to leave self-play unburdened and retain optimality. We propose a belief learning paradigm that can maintain beliefs over rollouts of policies not seen at training time, and can thus decode and adapt to novel conventions at test time. We show how our paradigm also promotes explainability and interpretability of otherwise nuanced agent conventions.

Author Information

Darius Muglich (University of British Columbia)
Luisa Zintgraf (University of Oxford)
Christian Schroeder de Witt (University of Oxford)
Shimon Whiteson (University of Oxford)
Jakob Foerster (University of Oxford)

Jakob Foerster received a CIFAR AI chair in 2019 and is starting as an Assistant Professor at the University of Toronto and the Vector Institute in the academic year 20/21. During his PhD at the University of Oxford, he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind. He has since been working as a research scientist at Facebook AI Research in California, where he will continue advancing the field up to his move to Toronto. He was the lead organizer of the first Emergent Communication (EmeCom) workshop at NeurIPS in 2017, which he has helped organize ever since.

