Skip to yearly menu bar Skip to main content


Poster

Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

Elad Hazan · Satyen Kale


Abstract:

We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in \cite{AbernethyR09}, which is parameterized by a scalar (\alpha). We prove that the regret of \newtron is (O(\log T)) when (\alpha) is a constant that does not vary with horizon (T), and at most (O(T^{2/3})) if (\alpha) is allowed to increase to infinity with (T). For (\alpha) = (O(\log T)), the regret is bounded by (O(\sqrt{T})), thus solving the open problem of \cite{KST08, AbernethyR09}. Our algorithm is based on a novel application of the online Newton method \cite{HAK07}. We test our algorithm and show it to perform well in experiments, even when (\alpha) is a small constant.

Live content is unavailable. Log in and register to view live content