Timezone: »

Algorithms for Infinitely Many-Armed Bandits
Yizao Wang · Jean-Yves Audibert · Remi Munos

Tue Dec 09 11:57 AM -- 11:58 AM (PST) @ None

We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments. We make a stochastic assumption on the mean-reward of a new selected arm which characterizes its probability of being a near-optimal arm. Our assumption is weaker than in previous works. We describe algorithms based on upper-confidence-bounds applied to a restricted set of randomly selected arms and provide upper-bounds on the resulting expected regret. We also derive a lower-bound which matchs (up to logarithmic factors) the upper-bound in some cases.

Author Information

Yizao Wang (University of Michigan)
Jean-Yves Audibert (Université Paris Est)
Remi Munos (Google DeepMind)

More from the Same Authors