Timezone: »
We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known. Besides the main message, we also refine existing deviation inequalities, which allow us to design fully sequential strategies with finite-time regret guarantees that are (a) asymptotically optimal as the horizon grows and (b) order-optimal in the minimax sense. Furthermore we provide empirical evidence that the theory also holds in practice and discuss extensions to non-gaussian and multiple-armed case.
Author Information
Aurélien Garivier (Ecole Normale Supérieure de Lyon)
Tor Lattimore (DeepMind)
Emilie Kaufmann (Telecom ParisTech)
More from the Same Authors
-
2022 Poster: Regret Bounds for Information-Directed Reinforcement Learning »
Botao Hao · Tor Lattimore -
2018 Poster: TopRank: A practical algorithm for online stochastic ranking »
Tor Lattimore · Branislav Kveton · Shuai Li · Csaba Szepesvari -
2018 Poster: Single-Agent Policy Tree Search With Guarantees »
Laurent Orseau · Levi Lelis · Tor Lattimore · Theophane Weber -
2017 Poster: A Scale Free Algorithm for Stochastic Bandits with Bounded Kurtosis »
Tor Lattimore -
2017 Poster: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning »
Christoph Dann · Tor Lattimore · Emma Brunskill -
2017 Spotlight: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning »
Christoph Dann · Tor Lattimore · Emma Brunskill -
2016 Poster: Refined Lower Bounds for Adversarial Bandits »
Sébastien Gerchinovitz · Tor Lattimore -
2016 Poster: Causal Bandits: Learning Good Interventions via Causal Inference »
Finnian Lattimore · Tor Lattimore · Mark Reid -
2016 Poster: Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities »
Ruitong Huang · Tor Lattimore · András György · Csaba Szepesvari -
2015 Poster: The Pareto Regret Frontier for Bandits »
Tor Lattimore -
2015 Poster: Linear Multi-Resource Allocation with Semi-Bandit Feedback »
Tor Lattimore · Yacov Crammer · Csaba Szepesvari -
2014 Poster: Bounded Regret for Finite-Armed Structured Bandits »
Tor Lattimore · Remi Munos -
2013 Poster: Thompson Sampling for 1-Dimensional Exponential Family Bandits »
Nathaniel Korda · Emilie Kaufmann · Remi Munos