Timezone: »
In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions). Unlike Thompson Sampling which requires to specify a different prior to be optimal in each case, our proposal RB-SDA does not need any distribution-dependent tuning. RB-SDA belongs to the family of Sub-sampling Duelling Algorithms (SDA) which combines the sub-sampling idea first used by the BESA and SSMC algorithms with different sub-sampling schemes. In particular, RB-SDA uses Random Block sampling. We perform an experimental study assessing the flexibility and robustness of this promising novel approach for exploration in bandit models.
Author Information
Dorian Baudry (CNRS/Inria)
Emilie Kaufmann (CNRS)
Odalric-Ambrym Maillard (INRIA)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Spotlight: Sub-sampling for Efficient Non-Parametric Bandit Exploration »
Wed. Dec 9th 03:10 -- 03:20 AM Room Orals & Spotlights: Reinforcement Learning
More from the Same Authors
-
2021 Spotlight: Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits »
Reda Ouhamma · Odalric-Ambrym Maillard · Vianney Perchet -
2022 Poster: Top Two Algorithms Revisited »
Marc Jourdan · Rémy Degenne · Dorian Baudry · Rianne de Heide · Emilie Kaufmann -
2022 Poster: IMED-RL: Regret optimal learning of ergodic Markov decision processes »
Fabien Pesquerel · Odalric-Ambrym Maillard -
2022 Poster: Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs »
Andrea Tirinzoni · Aymen Al Marjani · Emilie Kaufmann -
2022 Poster: Near-Optimal Collaborative Learning in Bandits »
Clémence Réda · Sattar Vakili · Emilie Kaufmann -
2022 Poster: Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits »
Lilian Besson · Emilie Kaufmann · Odalric-Ambrym Maillard · Julien Seznec -
2021 Poster: Stochastic bandits with groups of similar arms. »
Fabien Pesquerel · Hassan SABER · Odalric-Ambrym Maillard -
2021 Poster: Indexed Minimum Empirical Divergence for Unimodal Bandits »
Hassan SABER · Pierre Ménard · Odalric-Ambrym Maillard -
2021 Poster: Stochastic Online Linear Regression: the Forward Algorithm to Replace Ridge »
Reda Ouhamma · Odalric-Ambrym Maillard · Vianney Perchet -
2021 Poster: From Optimality to Robustness: Adaptive Re-Sampling Strategies in Stochastic Bandits »
Dorian Baudry · Patrick Saux · Odalric-Ambrym Maillard -
2021 Poster: Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits »
Reda Ouhamma · Odalric-Ambrym Maillard · Vianney Perchet -
2020 Poster: Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs »
Edouard Leurent · Odalric-Ambrym Maillard · Denis Efimov -
2020 Oral: Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs »
Edouard Leurent · Odalric-Ambrym Maillard · Denis Efimov -
2020 Poster: Planning in Markov Decision Processes with Gap-Dependent Sample Complexity »
Anders Jonsson · Emilie Kaufmann · Pierre Menard · Omar Darwiche Domingues · Edouard Leurent · Michal Valko -
2019 Poster: Budgeted Reinforcement Learning in Continuous State Space »
Nicolas Carrara · Edouard Leurent · Romain Laroche · Tanguy Urvoy · Odalric-Ambrym Maillard · Olivier Pietquin -
2019 Poster: Learning Multiple Markov Chains via Adaptive Allocation »
Mohammad Sadegh Talebi · Odalric-Ambrym Maillard -
2019 Poster: Regret Bounds for Learning State Representations in Reinforcement Learning »
Ronald Ortner · Matteo Pirotta · Alessandro Lazaric · Ronan Fruit · Odalric-Ambrym Maillard -
2014 Workshop: From Bad Models to Good Policies (Sequential Decision Making under Uncertainty) »
Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor · Jeremie Mary · Laurent Orseau · Thomas Dietterich · Ronald Ortner · Peter Grünwald · Joelle Pineau · Raphael Fonteneau · Georgios Theocharous · Esteban D Arcaute · Christos Dimitrakakis · Nan Jiang · Doina Precup · Pierre-Luc Bacon · Marek Petrik · Aviv Tamar -
2014 Poster: "How hard is my MDP?" The distribution-norm to the rescue »
Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor -
2014 Oral: "How hard is my MDP?" The distribution-norm to the rescue »
Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor -
2012 Poster: Online allocation and homogeneous partitioning for piecewise constant mean-approximation »
Alexandra Carpentier · Odalric-Ambrym Maillard -
2012 Poster: Hierarchical Optimistic Region Selection driven by Curiosity »
Odalric-Ambrym Maillard -
2011 Poster: Selecting the State-Representation in Reinforcement Learning »
Odalric-Ambrym Maillard · Remi Munos · Daniil Ryabko -
2011 Poster: Sparse Recovery with Brownian Sensing »
Alexandra Carpentier · Odalric-Ambrym Maillard · Remi Munos -
2010 Spotlight: LSTD with Random Projections »
Mohammad Ghavamzadeh · Alessandro Lazaric · Odalric-Ambrym Maillard · Remi Munos -
2010 Poster: LSTD with Random Projections »
Mohammad Ghavamzadeh · Alessandro Lazaric · Odalric-Ambrym Maillard · Remi Munos -
2010 Poster: Scrambled Objects for Least-Squares Regression »
Odalric-Ambrym Maillard · Remi Munos -
2009 Poster: Compressed Least-Squares Regression »
Odalric-Ambrym Maillard · Remi Munos