Timezone: »
Oral
The Price of Bandit Information for Online Optimization
Varsha Dani · Thomas P Hayes · Sham M Kakade
In the online linear optimization problem, a learner must choose, in each round, a decision from a set $D \subset \R^n$ in order to minimize an (unknown and changing) linear cost function. We present sharp rates of convergence (with respect to additive regret) for both the full information setting (where the cost function is revealed at the end of each round) and in the bandit setting (where only the scalar cost incurred is revealed). In particular, this paper is concerned with the \emph{price of bandit information}  how much worse the regret is in the bandit case as compared to the full information case. For the full information case, the upper bound on the regret is $O^*(\sqrt{nT})$, where $n$ is the ambient dimension and $T$ is the time horizon. For the bandit case, we present an algorithm which achieves $O^*(n^{3/2}\sqrt{T})$ regret  all previous (nontrivial) bounds here were $O(\textrm{poly}(n)T^{2/3})$ or worse. It is striking that the convergence rate for the bandit setting is only a factor of $n$ worse than in the full information case  in stark contrast to the $K$arm bandit setting, where the gap in the dependence on $K$ is exponential ($\sqrt{TK}$ vs. $\sqrt{T\log K}$). We also present lower bounds showing that this gap is at least $\sqrt{n}$, which we conjecture to be the correct order. The bandit algorithm we present can be implemented efficiently in special cases of particular interest, such as path planning and Markov Decision Problems.
Author Information
Varsha Dani (University of Chicago)
Thomas P Hayes (Toyota Technological Institute at Chicago)
Sham M Kakade (Harvard University & Microsoft Research)
Related Events (a corresponding poster, oral, or spotlight)

2007 Poster: The Price of Bandit Information for Online Optimization »
Tue. Dec 4th 06:30  06:40 PM Room
More from the Same Authors

2020 Tutorial: (Track3) Policy Optimization in Reinforcement Learning Q&A »
Sham M Kakade · Martha White · Nicolas Le Roux 
2020 Tutorial: (Track3) Policy Optimization in Reinforcement Learning »
Sham M Kakade · Martha White · Nicolas Le Roux 
2013 Poster: When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity »
Anima Anandkumar · Daniel Hsu · Majid Janzamin · Sham M Kakade 
2012 Poster: Learning Mixtures of Tree Graphical Models »
Anima Anandkumar · Daniel Hsu · Furong Huang · Sham M Kakade 
2012 Poster: A Spectral Algorithm for Latent Dirichlet Allocation »
Anima Anandkumar · Dean P Foster · Daniel Hsu · Sham M Kakade · YiKai Liu 
2012 Poster: Identifiability and Unmixing of Latent Parse Trees »
Percy Liang · Sham M Kakade · Daniel Hsu 
2012 Spotlight: A Spectral Algorithm for Latent Dirichlet Allocation »
Anima Anandkumar · Dean P Foster · Daniel Hsu · Sham M Kakade · YiKai Liu 
2011 Poster: Stochastic convex optimization with bandit feedback »
Alekh Agarwal · Dean P Foster · Daniel Hsu · Sham M Kakade · Sasha Rakhlin 
2011 Poster: Spectral Methods for Learning Multivariate Latent Tree Structure »
Anima Anandkumar · Kamalika Chaudhuri · Daniel Hsu · Sham M Kakade · Le Song · Tong Zhang 
2011 Poster: Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression »
Sham M Kakade · Adam Kalai · Varun Kanade · Ohad Shamir 
2010 Spotlight: Learning from Logged Implicit Exploration Data »
Alex Strehl · Lihong Li · John Langford · Sham M Kakade 
2010 Poster: Learning from Logged Implicit Exploration Data »
Alexander L Strehl · John Langford · Lihong Li · Sham M Kakade 
2009 Poster: MultiLabel Prediction via Compressed Sensing »
Daniel Hsu · Sham M Kakade · John Langford · Tong Zhang 
2009 Oral: MultiLabel Prediction via Compressed Sensing »
Daniel Hsu · Sham M Kakade · John Langford · Tong Zhang 
2008 Poster: Mind the Duality Gap: Logarithmic regret algorithms for online optimization »
Shai ShalevShwartz · Sham M Kakade 
2008 Poster: On the Generalization Ability of Online Strongly Convex Programming Algorithms »
Sham M Kakade · Ambuj Tewari 
2008 Spotlight: On the Generalization Ability of Online Strongly Convex Programming Algorithms »
Sham M Kakade · Ambuj Tewari 
2008 Spotlight: Mind the Duality Gap: Logarithmic regret algorithms for online optimization »
Shai ShalevShwartz · Sham M Kakade 
2008 Poster: On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization »
Sham M Kakade · Karthik Sridharan · Ambuj Tewari