Poster
Bandits with many optimal arms
Rianne de Heide · James Cheshire · Pierre Ménard · Alexandra Carpentier
Keywords: [ Bandits ]
Abstract:
We consider a stochastic bandit problem with a possibly infinite number of arms. We write for the proportion of optimal arms and for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting in terms of the problem parameters (the budget), and . For the objective of minimizing the cumulative regret, we provide a lower bound of order and a UCB-style algorithm with matching upper bound up to a factor of . Our algorithm needs to calibrate its parameters, and we prove that this knowledge is necessary, since adapting to in this setting is impossible. For best-arm identification we also provide a lower bound of order on the probability of outputting a sub-optimal arm where is an absolute constant. We also provide an elimination algorithm with an upper bound matching the lower bound up to a factor of order in the exponential, and that does not need or as parameter. Our results apply directly to the three related problems of competing against the -th best arm, identifying an good arm, and finding an arm with mean larger than a quantile of a known order.
Chat is not available.