Timezone: »
Poster
Explore no more: Improved high-probability regret bounds for non-stochastic bandits
Gergely Neu
This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them requires a large deal of technical effort and significant modifications to the standard, more intuitive algorithms that come only with guarantees that hold on expectation. One of these modifications is forcing the learner to sample arms from the uniform distribution at least $\Omega(\sqrt{T})$ times over $T$ rounds, which can adversely affect performance if many of the arms are suboptimal. While it is widely conjectured that this property is essential for proving high-probability regret bounds, we show in this paper that it is possible to achieve such strong results without this undesirable exploration component. Our result relies on a simple and intuitive loss-estimation strategy called Implicit eXploration (IX) that allows a remarkably clean analysis. To demonstrate the flexibility of our technique, we derive several improved high-probability bounds for various extensions of the standard multi-armed bandit framework.Finally, we conduct a simple experiment that illustrates the robustness of our implicit exploration technique.
Author Information
Gergely Neu (INRIA)
More from the Same Authors
-
2020 Poster: A Unifying View of Optimism in Episodic Reinforcement Learning »
Gergely Neu · Ciara Pike-Burke -
2019 Poster: Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates »
Carlos Riquelme · Hugo Penedones · Damien Vincent · Hartmut Maennel · Sylvain Gelly · Timothy A Mann · Andre Barreto · Gergely Neu -
2019 Poster: Beating SGD Saturation with Tail-Averaging and Minibatching »
Nicole Muecke · Gergely Neu · Lorenzo Rosasco -
2017 Poster: Boltzmann Exploration Done Right »
Nicolò Cesa-Bianchi · Claudio Gentile · Gergely Neu · Gabor Lugosi -
2014 Poster: Exploiting easy data in online optimization »
Amir Sani · Gergely Neu · Alessandro Lazaric -
2014 Poster: Efficient learning by implicit exploration in bandit problems with side observations »
Tomáš Kocák · Gergely Neu · Michal Valko · Remi Munos -
2014 Spotlight: Exploiting easy data in online optimization »
Amir Sani · Gergely Neu · Alessandro Lazaric -
2014 Poster: Online combinatorial optimization with stochastic decision sets and adversarial losses »
Gergely Neu · Michal Valko -
2013 Poster: Online learning in episodic Markovian decision processes by relative entropy policy search »
Alexander Zimin · Gergely Neu -
2010 Spotlight: Online Markov Decision Processes under Bandit Feedback »
Gergely Neu · András György · András Antos · Csaba Szepesvari -
2010 Poster: Online Markov Decision Processes under Bandit Feedback »
Gergely Neu · András György · Csaba Szepesvari · András Antos