NeurIPS Poster Ensemble sampling for linear bandits: small ensembles suffice

Poster

Ensemble sampling for linear bandits: small ensembles suffice

David Janz · Alexander Litvak · Csaba Szepesvari

West Ballroom A-D #6800

[ Abstract ]

[ Paper] [ OpenReview]

Fri 13 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a

d

$d$ -dimensional stochastic linear bandit with an interaction horizon

T

$T$ , ensemble sampling with an ensemble of size of order

d \log T

$\smash{d \log T}$ incurs regret at most of the order

(d \log T)^{5 / 2} \sqrt{T}

$\smash{(d \log T)^{5/2} \sqrt{T}}$ . Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with

T

$T$ ---which defeats the purpose of ensemble sampling---while obtaining near

\sqrt{T}

$\smash{\sqrt{T}}$ order regret. Our result is also the first to allow for infinite action sets.

Chat is not available.