Skip to yearly menu bar Skip to main content


Poster

First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

Julia Olkhovskaya · Jack Mayo · Tim van Erven · Gergely Neu · Chen-Yu Wei

Great Hall & Hall B1+B2 (level 1) #1805

Abstract: We consider the adversarial linear contextual bandit setting, whichallows for the loss functions associated with each of KK arms to changeover time without restriction. Assuming the dd-dimensional contexts aredrawn from a fixed known distribution, the worst-case expected regretover the course of TT rounds is known to scale as ˜O(KdT)~O(KdT). Under the additional assumption that the density of the contextsis log-concave, we obtain a second-order bound of order \tildeO(KdVT)\tildeO(KdVT) in terms of the cumulative second moment of thelearner's losses VTVT, and a closely related first-order bound of order˜O(KdLT)~O(KdLT) in terms of the cumulative loss of the bestpolicy LTLT. Since VTVT or LTLT may be significantly smaller thanTT, these improve over the worst-case regret whenever the environmentis relatively benign. Our results are obtained using a truncated versionof the continuous exponential weights algorithm over the probabilitysimplex, which we analyse by exploiting a novel connection to the linearbandit setting without contexts.

Chat is not available.