NeurIPS Poster Bandits with many optimal arms

Poster

Bandits with many optimal arms

Rianne de Heide · James Cheshire · Pierre Ménard · Alexandra Carpentier

Keywords: [ Bandits ]

[ Abstract ]

[ OpenReview]

Abstract: We consider a stochastic bandit problem with a possibly infinite number of arms. We write

p^{*}

$p^*$ for the proportion of optimal arms and

Δ

$\Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting in terms of the problem parameters

T

$T$ (the budget),

p^{*}

$p^*$ and

Δ

$\Delta$ . For the objective of minimizing the cumulative regret, we provide a lower bound of order

Ω (\log (T) / (p^{*} Δ))

$\Omega(\log(T)/(p^*\Delta))$ and a UCB-style algorithm with matching upper bound up to a factor of

\log (1 / Δ)

$\log(1/\Delta)$ . Our algorithm needs

p^{*}

$p^*$ to calibrate its parameters, and we prove that this knowledge is necessary, since adapting to

p^{*}

$p^*$ in this setting is impossible. For best-arm identification we also provide a lower bound of order

Ω (\exp (- c T Δ^{2} p^{*}))

$\Omega(\exp(-cT\Delta^2p^*))$ on the probability of outputting a sub-optimal arm where

c > 0

$c>0$ is an absolute constant. We also provide an elimination algorithm with an upper bound matching the lower bound up to a factor of order

\log (T)

$\log(T)$ in the exponential, and that does not need

p^{*}

$p^*$ or

Δ

$\Delta$ as parameter. Our results apply directly to the three related problems of competing against the

j

$j$ -th best arm, identifying an

ϵ

$\epsilon$ good arm, and finding an arm with mean larger than a quantile of a known order.

Chat is not available.