NeurIPS Poster An Exploration-by-Optimization Approach to Best of Both Worlds in Linear Bandits

Poster

An Exploration-by-Optimization Approach to Best of Both Worlds in Linear Bandits

Shinji Ito · Kei Takemura

Great Hall & Hall B1+B2 (level 1) #1812

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Abstract: In this paper, we consider how to construct best-of-both-worlds linear bandit algorithms that achieve nearly optimal performance for both stochastic and adversarial environments. For this purpose, we show that a natural approach referred to as exploration by optimization [Lattimore and Szepesvári, 2020] works well. Specifically, an algorithm constructed using this approach achieves

O (d \sqrt{T \log T})

$O(d \sqrt{ T \log{T}})$ -regret in adversarial environments and

O (\frac{d^{2} \log T}{Δ_{min}})

$O(\frac{d^2 \log T}{\Delta_{\min}} )$ -regret in stochastic environments. Symbols

d

$d$ ,

T

$T$ and

Δ_{min}

$\Delta_{\min}$ here represent the dimensionality of the action set, the time horizon, and the minimum sub-optimality gap, respectively. We also show that this algorithm has even better theoretical guarantees for important special cases including the multi-armed bandit problem and multitask bandits.

Chat is not available.