NeurIPS Poster Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits

Poster

Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits

Julien Zhou · Pierre Gaillard · Thibaud Rahier · Houssam Zenati · Julyan Arbel

West Ballroom A-D #5809

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: We address the problem of stochastic combinatorial semi-bandits, where a player selects among

P

$P$ actions from the power set of a set containing

d

$d$ base items. Adaptivity to the problem's structure is essential in order to obtain optimal regret upper bounds. As estimating the coefficients of a covariance matrix can be manageable in practice, leveraging them should improve the regret. We design

optimistic'' covariance-adaptive algorithms relying on online estimations of the covariance structure, called OLS-UCB-C and COS-V (only the variances for the latter). They both yields improved gap-free regret. Although COS-V can be slightly suboptimal, it improves on computational complexity by taking inspiration from Thompson Sampling approaches. It is the first sampling-based algorithm satisfying a

\sqrt{T}

$\sqrt{T}$ gap-free regret (up to poly-logs). We also show that in some cases, our approach efficiently leverages the semi-bandit feedback and outperforms bandit feedback approaches, not only in exponential regimes where

P ≫ d

$P\gg d$ but also when

P \leq d

$P\leq d$ , which is not covered by existing analyses.

Chat is not available.