Timezone: »

Sub-sampling for Efficient Non-Parametric Bandit Exploration
Dorian Baudry · Emilie Kaufmann · Odalric-Ambrym Maillard

Tue Dec 08 07:10 PM -- 07:20 PM (PST) @ Orals & Spotlights: Reinforcement Learning

In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions). Unlike Thompson Sampling which requires to specify a different prior to be optimal in each case, our proposal RB-SDA does not need any distribution-dependent tuning. RB-SDA belongs to the family of Sub-sampling Duelling Algorithms (SDA) which combines the sub-sampling idea first used by the BESA and SSMC algorithms with different sub-sampling schemes. In particular, RB-SDA uses Random Block sampling. We perform an experimental study assessing the flexibility and robustness of this promising novel approach for exploration in bandit models.

Author Information

Dorian Baudry (CNRS/Inria)
Emilie Kaufmann (CNRS)
Odalric-Ambrym Maillard (INRIA)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors