NeurIPS Poster On the Minimax Regret for Contextual Linear Bandits and Multi-Armed Bandits with Expert Advice

Poster

On the Minimax Regret for Contextual Linear Bandits and Multi-Armed Bandits with Expert Advice

Shinji Ito

West Ballroom A-D #5803

[ Abstract ]

[ Paper] [ OpenReview]

Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract: This paper examines two extensions of multi-armed bandit problems: multi-armed bandits with expert advice and contextual linear bandits. For the former problem, multi-armed bandits with expert advice, the previously known best upper and lower bounds have been

O (\sqrt{K T \log \frac{N}{K}})

$O(\sqrt{KT \log \frac{N}{K} })$ and

Ω (\sqrt{K T \frac{\log N}{\log K}})

$\Omega( \sqrt{KT \frac{ \log N }{\log K }} )$ , respectively. Here,

K

$K$ ,

N

$N$ , and

T

$T$ represent the numbers of arms, experts, and rounds, respectively. We provide a lower bound of

Ω (\sqrt{K T \log \frac{N}{K}})

$\Omega( \sqrt{KT \log \frac{N}{K}} )$ for the setup in which the player chooses an expert before observing the advices in each round. For the latter problem, contextual linear bandits, we provide an algorithm that achieves

O (\sqrt{d T \log (K min {1, \frac{S}{d}})})

$O ( \sqrt{d T \log ( K \min\{ 1, \frac{S}{d} \} )} )$ together with a matching lower bound, where

d

$d$ and

S

$S$ represent the dimensionality of feature vectors and the size of the context space, respectively.

Chat is not available.