Poster
On the Minimax Regret for Contextual Linear Bandits and Multi-Armed Bandits with Expert Advice
Shinji Ito
West Ballroom A-D #5803
Abstract:
This paper examines two extensions of multi-armed bandit problems: multi-armed bandits with expert advice and contextual linear bandits. For the former problem, multi-armed bandits with expert advice, the previously known best upper and lower bounds have been and , respectively. Here, , , and represent the numbers of arms, experts, and rounds, respectively. We provide a lower bound of for the setup in which the player chooses an expert before observing the advices in each round. For the latter problem, contextual linear bandits, we provide an algorithm that achieves together with a matching lower bound, where and represent the dimensionality of feature vectors and the size of the context space, respectively.
Chat is not available.