Timezone: »
We study an online learning setting where the player is temporarily deprived of feedback each time it switches to a different action. Such model of \emph{adaptive feedback} naturally occurs in scenarios where the environment reacts to the player's actions and requires some time to recover and stabilize after the algorithm switches actions. This motivates a variant of the multi-armed bandit problem, which we call the \emph{blinded multi-armed bandit}, in which no feedback is given to the algorithm whenever it switches arms. We develop efficient online learning algorithms for this problem and prove that they guarantee the same asymptotic regret as the optimal algorithms for the standard multi-armed bandit problem. This result stands in stark contrast to another recent result, which states that adding a switching cost to the standard multi-armed bandit makes it substantially harder to learn, and provides a direct comparison of how feedback and loss contribute to the difficulty of an online learning problem. We also extend our results to the general prediction framework of bandit linear optimization, again attaining near-optimal regret bounds.
Author Information
Ofer Dekel (Microsoft Research)
Elad Hazan (Technion)
Tomer Koren (Technion)
More from the Same Authors
-
2018 Poster: Learning SMaLL Predictors »
Vikas Garg · Ofer Dekel · Lin Xiao -
2017 Poster: Online Learning with a Hint »
Ofer Dekel · arthur flajolet · Nika Haghtalab · Patrick Jaillet -
2016 Poster: Online Pricing with Strategic and Patient Buyers »
Michal Feldman · Tomer Koren · Roi Livni · Yishay Mansour · Aviv Zohar -
2016 Poster: The Limits of Learning with Missing Data »
Brian Bullins · Elad Hazan · Tomer Koren -
2015 Poster: Fast Rates for Exp-concave Empirical Risk Minimization »
Tomer Koren · Kfir Y. Levy -
2015 Poster: Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff »
Ofer Dekel · Ronen Eldan · Tomer Koren -
2015 Spotlight: Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff »
Ofer Dekel · Ronen Eldan · Tomer Koren -
2014 Poster: Bandit Convex Optimization: Towards Tight Bounds »
Elad Hazan · Kfir Y. Levy -
2013 Poster: Distributed Exploration in Multi-Armed Bandits »
Eshcar Hillel · Zohar Karnin · Tomer Koren · Ronny Lempel · Oren Somekh -
2013 Poster: Online Learning with Switching Costs and Other Adaptive Adversaries »
Nicolò Cesa-Bianchi · Ofer Dekel · Ohad Shamir -
2013 Spotlight: Distributed Exploration in Multi-Armed Bandits »
Eshcar Hillel · Zohar Karnin · Tomer Koren · Ronny Lempel · Oren Somekh -
2013 Session: Oral Session 8 »
Ofer Dekel -
2012 Poster: A Polylog Pivot Steps Simplex Algorithm for Classification »
Elad Hazan · Zohar S Karnin -
2011 Poster: Approximating Semidefinite Programs in Sublinear Time »
Dan Garber · Elad Hazan -
2011 Poster: Beating SGD: Learning SVMs in Sublinear Time »
Elad Hazan · Tomer Koren · Nati Srebro -
2011 Poster: Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction »
Elad Hazan · Satyen Kale -
2010 Workshop: Learning on Cores, Clusters, and Clouds »
Alekh Agarwal · Lawrence Cayton · Ofer Dekel · John Duchi · John Langford -
2010 Session: Spotlights Session 4 »
Ofer Dekel -
2010 Session: Oral Session 4 »
Ofer Dekel -
2009 Poster: Distribution-Calibrated Hierarchical Classification »
Ofer Dekel -
2008 Poster: From Online to Batch Learning with Cutoff-Averaging »
Ofer Dekel -
2006 Poster: Support Vector Machines on a Budget »
Ofer Dekel · Yoram Singer -
2006 Spotlight: Support Vector Machines on a Budget »
Ofer Dekel · Yoram Singer