NeurIPS Poster Multi-Agent Multi-Armed Bandits with Limited Communication

Poster

Multi-Agent Multi-Armed Bandits with Limited Communication

Mridul Agarwal · Vaneet Aggarwal · Kamyar Azizzadenesheli

Hall J (level 1) #1008

Keywords: [ JMLR ] [ Journal Track ]

[ Abstract ]

[ Slides] [ Poster]

[Paper PDF]

Abstract: We consider the problem where

N

$N$ agents collaboratively interact with an instance of a stochastic

K

$K$ arm bandit problem for

K ≫ N

$K \gg N$ . The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of

T

$T$ time steps, the number of communication rounds, and the number of bits in each communication round. We present Limited Communication Collaboration - Upper Confidence Bound (LCC-UCB), a doubling-epoch based algorithm where each agent communicates only after the end of the epoch and shares the index of the best arm it knows. With our algorithm, LCC-UCB, each agent enjoys a regret of

\tilde{O} (\sqrt{(K / N + N) T})

$\tilde{O}\left(\sqrt{({K/N}+ N)T}\right)$ , communicates for

O (\log T)

$O(\log T)$ steps and broadcasts

O (\log K)

$O(\log K)$ bits in each communication step. We extend the work to sparse graphs with maximum degree

K_{G}

$K_G$ and diameter

D

$D$ to propose LCC-UCB-GRAPH which enjoys a regret bound of

\tilde{O} (D \sqrt{(K / N + K_{G}) D T})

$\tilde{O}\left(D\sqrt{(K/N+ K_G)DT}\right)$ . Finally, we empirically show that the LCC-UCB and the LCC-UCB-GRAPH algorithms perform well and outperform strategies that communicate through a central node.

Chat is not available.