NeurIPS Poster Beyond Bandit Feedback in Online Multiclass Classification

Poster

Beyond Bandit Feedback in Online Multiclass Classification

Dirk van der Hoeven · Federico Fusco · Nicolò Cesa-Bianchi

Keywords: [ Bandits ] [ Online Learning ] [ Machine Learning ] [ Graph Learning ] [ Reinforcement Learning and Planning ]

[ Abstract ]

[ OpenReview]

Abstract: We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph. While including bandit feedback as a special case, feedback graphs allow a much richer set of applications, including filtering and label efficient classification.We introduce \textproc{Gappletron}, the first online multiclass algorithm that works with arbitrary feedback graphs. For this new algorithm,we prove surrogate regret bounds that hold, both in expectation and with high probability, for a large class of surrogate losses. Our bounds are of order

$B\sqrt{\rho KT}$ , where

$B$ is the diameter of the prediction space,

$K$ is the number of classes,

$T$ is the time horizon, and

$\rho$ is the domination number (a graph-theoretic parameter affecting the amount of exploration). In the full information case, we show that \textproc{Gappletron} achieves a constant surrogate regret of order

$B^2K$ . We also prove a general lower bound of order

$\max\big\{B^2K,\sqrt{T}\big\}$ showing that our upper bounds are not significantly improvable. Experiments on synthetic data show that for various feedback graphs our algorithm is competitive against known baselines.

Chat is not available.