Timezone: »
Poster
Fighting Bandits with a New Kind of Smoothness
Jacob D Abernethy · Chansoo Lee · Ambuj Tewari
We focus on the adversarial multi-armed bandit problem. The EXP3 algorithm of Auer et al. (2003) was shown to have a regret bound of $O(\sqrt{T N \log N})$, where $T$ is the time horizon and $N$ is the number of available actions (arms). More recently, Audibert and Bubeck (2009) improved the bound by a logarithmic factor via an entirely different method. In the present work, we provide a new set of analysis tools, using the notion of convex smoothing, to provide several novel algorithms with optimal guarantees. First we show that regularization via the Tsallis entropy matches the minimax rate of Audibert and Bubeck (2009) with an even tighter constant; it also fully generalizes EXP3. Second we show that a wide class of perturbation methods lead to near-optimal bandit algorithms as long as a simple condition on the perturbation distribution $\mathcal{D}$ is met: one needs that the hazard function of $\mathcal{D}$ remain bounded. The Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this key property; interestingly, the Gaussian and Uniform distributions do not.
Author Information
Jacob D Abernethy (University of Michigan)
Chansoo Lee (University of Michigan)
Ambuj Tewari (University of Michigan)
More from the Same Authors
-
2020 Poster: TorsionNet: A Reinforcement Learning Approach to Sequential Conformer Search »
Tarun Gogineni · Ziping Xu · Exequiel Punzalan · Runxuan Jiang · Joshua Kammeraad · Ambuj Tewari · Paul Zimmerman -
2020 Poster: Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting »
Ziping Xu · Ambuj Tewari -
2020 Poster: On the Equivalence between Online and Private Learnability beyond Binary Classification »
Young H Jung · Baekjin Kim · Ambuj Tewari -
2020 Spotlight: On the Equivalence between Online and Private Learnability beyond Binary Classification »
Young H Jung · Baekjin Kim · Ambuj Tewari -
2019 Poster: Generalization Bounds in the Predict-then-Optimize Framework »
Othman El Balghiti · Adam N. Elmachtoub · Paul Grigas · Ambuj Tewari -
2019 Poster: Online Learning via the Differential Privacy Lens »
Jacob Abernethy · Young H Jung · Chansoo Lee · Audra McMillan · Ambuj Tewari -
2019 Spotlight: Online Learning via the Differential Privacy Lens »
Jacob Abernethy · Young H Jung · Chansoo Lee · Audra McMillan · Ambuj Tewari -
2019 Poster: Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems »
Young H Jung · Ambuj Tewari -
2019 Poster: On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems »
Baekjin Kim · Ambuj Tewari -
2018 Workshop: CiML 2018 - Machine Learning competitions "in the wild": Playing in the real world or in real time »
Isabelle Guyon · Evelyne Viegas · Sergio Escalera · Jacob D Abernethy -
2018 Poster: Active Learning for Non-Parametric Regression Using Purely Random Trees »
Jack Goetz · Ambuj Tewari · Paul Zimmerman -
2018 Poster: But How Does It Work in Theory? Linear SVM with Random Features »
Yitong Sun · Anna Gilbert · Ambuj Tewari -
2017 Workshop: Machine Learning Challenges as a Research Tool »
Isabelle Guyon · Evelyne Viegas · Sergio Escalera · Jacob D Abernethy -
2017 Poster: Action Centered Contextual Bandits »
Kristjan Greenewald · Ambuj Tewari · Susan Murphy · Predag Klasnja -
2017 Poster: Online multiclass boosting »
Young H Jung · Jack Goetz · Ambuj Tewari -
2017 Poster: On Frank-Wolfe and Equilibrium Computation »
Jacob D Abernethy · Jun-Kun Wang -
2017 Spotlight: On Frank-Wolfe and Equilibrium Computation »
Jacob D Abernethy · Jun-Kun Wang -
2016 Poster: Hardness of Online Sleeping Combinatorial Optimization Problems »
Satyen Kale · Chansoo Lee · David Pal -
2016 Poster: Phased Exploration with Greedy Exploitation in Stochastic Combinatorial Partial Monitoring Games »
Sougata Chaudhuri · Ambuj Tewari -
2016 Poster: Threshold Bandits, With and Without Censored Feedback »
Jacob D Abernethy · Kareem Amin · Ruihao Zhu -
2015 Poster: Predtron: A Family of Online Algorithms for General Prediction Problems »
Prateek Jain · Nagarajan Natarajan · Ambuj Tewari -
2015 Poster: A Market Framework for Eliciting Private Data »
Bo Waggoner · Rafael Frongillo · Jacob D Abernethy -
2015 Poster: Alternating Minimization for Regression Problems with Vector-valued Outputs »
Prateek Jain · Ambuj Tewari -
2014 Workshop: Large-scale reinforcement learning and Markov decision problems »
Benjamin Van Roy · Mohammad Ghavamzadeh · Peter Bartlett · Yasin Abbasi Yadkori · Ambuj Tewari -
2014 Workshop: NIPS Workshop on Transactional Machine Learning and E-Commerce »
David Parkes · David H Wolpert · Jennifer Wortman Vaughan · Jacob D Abernethy · Amos Storkey · Mark Reid · Ping Jin · Nihar Bhadresh Shah · Mehryar Mohri · Luis E Ortiz · Robin Hanson · Aaron Roth · Satyen Kale · Sebastien Lahaie -
2014 Poster: On Iterative Hard Thresholding Methods for High-dimensional M-Estimation »
Prateek Jain · Ambuj Tewari · Purushottam Kar -
2013 Poster: Minimax Optimal Algorithms for Unconstrained Linear Optimization »
Brendan McMahan · Jacob D Abernethy -
2013 Poster: Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses »
Harish G Ramaswamy · Shivani Agarwal · Ambuj Tewari -
2013 Poster: Adaptive Market Making via Online Learning »
Jacob D Abernethy · Satyen Kale -
2013 Poster: How to Hedge an Option Against an Adversary: Black-Scholes Pricing is Minimax Optimal »
Jacob D Abernethy · Peter Bartlett · Rafael Frongillo · Andre Wibisono -
2013 Spotlight: How to Hedge an Option Against an Adversary: Black-Scholes Pricing is Minimax Optimal »
Jacob D Abernethy · Peter Bartlett · Rafael Frongillo · Andre Wibisono -
2013 Oral: Adaptive Market Making via Online Learning »
Jacob D Abernethy · Satyen Kale -
2013 Spotlight: Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses »
Harish G Ramaswamy · Shivani Agarwal · Ambuj Tewari -
2013 Poster: Learning with Noisy Labels »
Nagarajan Natarajan · Inderjit Dhillon · Pradeep Ravikumar · Ambuj Tewari -
2012 Poster: Feature Clustering for Accelerating Parallel Coordinate Descent »
Chad Scherrer · Ambuj Tewari · Mahantesh Halappanavar · David Haglin -
2011 Poster: A Collaborative Mechanism for Crowdsourcing Prediction Problems »
Jacob D Abernethy · Rafael Frongillo -
2011 Oral: A Collaborative Mechanism for Crowdsourcing Prediction Problems »
Jacob D Abernethy · Rafael Frongillo -
2011 Poster: Greedy Algorithms for Structurally Constrained High Dimensional Problems »
Ambuj Tewari · Pradeep Ravikumar · Inderjit Dhillon -
2011 Poster: On the Universality of Online Mirror Descent »
Nati Srebro · Karthik Sridharan · Ambuj Tewari -
2011 Poster: Nearest Neighbor based Greedy Coordinate Descent »
Inderjit Dhillon · Pradeep Ravikumar · Ambuj Tewari -
2011 Poster: Online Learning: Stochastic, Constrained, and Smoothed Adversaries »
Sasha Rakhlin · Karthik Sridharan · Ambuj Tewari -
2011 Poster: Orthogonal Matching Pursuit with Replacement »
Prateek Jain · Ambuj Tewari · Inderjit Dhillon -
2010 Oral: Online Learning: Random Averages, Combinatorial Parameters, and Learnability »
Sasha Rakhlin · Karthik Sridharan · Ambuj Tewari -
2010 Poster: Online Learning: Random Averages, Combinatorial Parameters, and Learnability »
Sasha Rakhlin · Karthik Sridharan · Ambuj Tewari -
2010 Poster: Smoothness, Low Noise and Fast Rates »
Nati Srebro · Karthik Sridharan · Ambuj Tewari -
2010 Poster: Repeated Games against Budgeted Adversaries »
Jacob D Abernethy · Manfred K. Warmuth -
2008 Poster: On the Generalization Ability of Online Strongly Convex Programming Algorithms »
Sham M Kakade · Ambuj Tewari -
2008 Spotlight: On the Generalization Ability of Online Strongly Convex Programming Algorithms »
Sham M Kakade · Ambuj Tewari -
2008 Poster: On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization »
Sham M Kakade · Karthik Sridharan · Ambuj Tewari -
2007 Poster: Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs »
Ambuj Tewari · Peter Bartlett -
2006 Poster: Sample Complexity of Policy Search with Known Dynamics »
Peter Bartlett · Ambuj Tewari