Timezone: »
Poster
OracleEfficient Algorithms for Online Linear Optimization with Bandit Feedback
Shinji Ito · Daisuke Hatano · Hanna Sumita · Kei Takemura · Takuro Fukunaga · Naonori Kakimura · KenIchi Kawarabayashi
Tue Dec 10 05:30 PM  07:30 PM (PST) @ East Exhibition Hall B + C #20
We propose computationally efficient algorithms for \textit{online linear optimization with bandit feedback}, in which a player chooses an \textit{action vector} from a given (possibly infinite) set $\mathcal{A} \subseteq \mathbb{R}^d$, and then suffers a loss that can be expressed as a linear function in action vectors. Although existing algorithms achieve an optimal regret bound of $\tilde{O}(\sqrt{T})$ for $T$ rounds (ignoring factors of $\mathrm{poly} (d, \log T)$), computationally efficient ways of implementing them have not yet been specified, in particular when $\mathcal{A}$ is not bounded by a polynomial size in $d$. A standard way to pursue computational efficiency is to assume that we have an efficient algorithm referred to as \textit{oracle} that solves (offline) linear optimization problems over $\mathcal{A}$. Under this assumption, the computational efficiency of a bandit algorithm can then be measured in terms of \textit{oracle complexity}, i.e., the number of oracle calls. Our contribution is to propose algorithms that offer optimal regret bounds of $\tilde{O}(\sqrt{T})$ as well as low oracle complexity for both \textit{nonstochastic settings} and \textit{stochastic settings}. Our algorithm for nonstochastic settings has an oracle complexity of $\tilde{O}( T )$ and is the first algorithm that achieves both a regret bound of $\tilde{O}( \sqrt{T} )$ and an oracle complexity of $\tilde{O} ( \mathrm{poly} ( T ) )$, given only linear optimization oracles. Our algorithm for stochastic settings calls the oracle only $O( \mathrm{poly} (d, \log T))$ times, which is smaller than the current best oracle complexity of $O( T )$ if $T$ is sufficiently large.
Author Information
Shinji Ito (NEC Corporation)
Daisuke Hatano (RIKEN AIP)
Hanna Sumita (Tokyo Metropolitan University)
Kei Takemura (NEC Corporation)
Takuro Fukunaga (Chuo University, JST PRESTO, RIKEN AIP)
Naonori Kakimura (Keio University)
KenIchi Kawarabayashi (National Institute of Informatics)
More from the Same Authors

2019 Poster: Improved Regret Bounds for Bandit Combinatorial Optimization »
Shinji Ito · Daisuke Hatano · Hanna Sumita · Kei Takemura · Takuro Fukunaga · Naonori Kakimura · KenIchi Kawarabayashi 
2019 Poster: Submodular Function Minimization with Noisy Evaluation Oracle »
Shinji Ito 
2018 Poster: Regret Bounds for Online Portfolio Selection with a Cardinality Constraint »
Shinji Ito · Daisuke Hatano · Hanna Sumita · Akihiro Yabe · Takuro Fukunaga · Naonori Kakimura · KenIchi Kawarabayashi 
2017 Poster: Efficient SublinearRegret Algorithms for Online Sparse Linear Regression with Limited Observation »
Shinji Ito · Daisuke Hatano · Hanna Sumita · Akihiro Yabe · Takuro Fukunaga · Naonori Kakimura · KenIchi Kawarabayashi