Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits
Shinji Ito, Shuichi Hirahara, Tasuku Soma, Yuichi Yoshida
Spotlight presentation: Orals & Spotlights Track 24: Learning Theory
on 2020-12-09T19:10:00-08:00 - 2020-12-09T19:20:00-08:00
on 2020-12-09T19:10:00-08:00 - 2020-12-09T19:20:00-08:00
Toggle Abstract Paper (in Proceedings / .pdf)
Abstract: We propose novel algorithms with first- and second-order regret bounds for adversarial linear bandits. These regret bounds imply that our algorithms perform well when there is an action achieving a small cumulative loss or the loss has a small variance. In addition, we need only assumptions weaker than those of existing algorithms; our algorithms work on discrete action sets as well as continuous ones without a priori knowledge about losses, and they run efficiently if a linear optimization oracle for the action set is available. These results are obtained by combining optimistic online optimization, continuous multiplicative weight update methods, and a novel technique that we refer to as distribution truncation. We also show that the regret bounds of our algorithms are tight up to polylogarithmic factors.