Off-Team Learning

Brandon Cui · Hengyuan Hu · Andrei Lupu · Samuel Sokota · Jakob Foerster

Hall J #411

Keywords: [ Cooperative Multi-Agent Reinforcement Learning ] [ Reinforcement Learning ] [ Deep Reinforcement Learning ] [ multi-agent reinforcement learning ]

[ Abstract ]
[ OpenReview
Wed 30 Nov 9 a.m. PST — 11 a.m. PST


Zero-shot coordination (ZSC) evaluates an algorithm by the performance of a team of agents that were trained independently under that algorithm. Off-belief learning (OBL) is a recent method that achieves state-of-the-art results in ZSC in the game Hanabi. However, the implementation of OBL relies on a belief model that experiences covariate shift. Moreover, during ad-hoc coordination, OBL or any other neural policy may experience test-time covariate shift. We present two methods addressing these issues. The first method, off-team belief learning (OTBL), attempts to improve the accuracy of the belief model of a target policy πT on a broader range of inputs by weighting trajectories approximately according to the distribution induced by a different policy πb. The second, off-team off-belief learning (OT-OBL), attempts to compute an OBL equilibrium, where fixed point error is weighted according to the distribution induced by cross-play between the training policy π and a different fixed policy πb instead of self-play of π. We investigate these methods in variants of Hanabi.

Chat is not available.