Timezone: »

Pure Exploration with Multiple Correct Answers
Rémy Degenne · Wouter Koolen

Wed Dec 11 05:00 PM -- 07:00 PM (PST) @ East Exhibition Hall B + C #8

We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensure that the existing Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. We present a new algorithm which extends Track-and-Stop to the multiple-answer case and has asymptotic sample complexity matching the lower bound.

Author Information

Rémy Degenne (Centrum Wiskunde & Informatica, Amsterdam)
Wouter Koolen (Centrum Wiskunde & Informatica, Amsterdam)

More from the Same Authors