Abstract:
We consider the problem where agents collaboratively interact with an instance of a stochastic arm bandit problem for . The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of time steps, the number of communication rounds, and the number of bits in each communication round. We present Limited Communication Collaboration - Upper Confidence Bound (LCC-UCB), a doubling-epoch based algorithm where each agent communicates only after the end of the epoch and shares the index of the best arm it knows. With our algorithm, LCC-UCB, each agent enjoys a regret of , communicates for steps and broadcasts bits in each communication step. We extend the work to sparse graphs with maximum degree and diameter to propose LCC-UCB-GRAPH which enjoys a regret bound of . Finally, we empirically show that the LCC-UCB and the LCC-UCB-GRAPH algorithms perform well and outperform strategies that communicate through a central node.
Chat is not available.