Poster
Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus
Qiwen Cui · Simon Du
Hall J (level 1) #539
Keywords: [ multi-agent reinforcement learning ] [ offline reinforcement learning ] [ Reinforcement Learning Theory ]
Abstract:
This paper considers offline multi-agent reinforcement learning. We propose the strategy-wise concentration principle which directly builds a confidence interval for the joint strategy, in contrast to the point-wise concentration principle which builds a confidence interval for each point in the joint action space. For two-player zero-sum Markov games, by exploiting the convexity of the strategy-wise bonus, we propose a computationally efficient algorithm whose sample complexity enjoys a better dependency on the number of actions than the prior methods based on the point-wise bonus. Furthermore, for offline multi-agent general-sum Markov games, based on the strategy-wise bonus and a novel surrogate function, we give the first algorithm whose sample complexity only scales where is the action size of the -th player and is the number of players. In sharp contrast, the sample complexity of methods based on the point-wise bonus would scale with the size of the joint action space due to the curse of multiagents. Lastly, all of our algorithms can naturally take a pre-specified strategy class as input and output a strategy that is close to the best strategy in . In this setting, the sample complexity only scales with instead of .
Chat is not available.