NeurIPS Poster Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

Poster

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

Hao-Lun Hsu · Weixin Wang · Miroslav Pajic · Pan Xu

West Ballroom A-D #6501

[ Abstract ]

[ Paper] [ Slides] [ Poster] [ OpenReview]

Thu 12 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL). We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy and the Langevin Monte Carlo exploration (LMC) strategy respectively, which are flexible in design and easy to implement in practice. For a special class of parallel MDPs where the transition is (approximately) linear, we theoretically prove that both CoopTS-PHE and CoopTS-LMC achieve a

\tilde{O} (d^{3 / 2} H^{2} \sqrt{M K})

$\widetilde{\mathcal{O}}(d^{3/2}H^2\sqrt{MK})$ regret bound with communication complexity

\tilde{O} (d H M^{2})

$\widetilde{\mathcal{O}}(dHM^2)$ , where

d

$d$ is the feature dimension,

H

$H$ is the horizon length,

M

$M$ is the number of agents, and

K

$K$ is the number of episodes. This is the first theoretical result for randomized exploration in cooperative MARL. We evaluate our proposed method on multiple parallel RL environments, including a deep exploration problem (i.e.,

N

$N$ -chain), a video game, and a real-world problem in energy systems. Our experimental results support that our framework can achieve better performance, even under conditions of misspecified transition models. Additionally, we establish a connection between our unified framework and the practical application of federated learning.

Chat is not available.