Gaussian Randomized Exploration for Semi-bandits with Sleeping Arms
Zhiming Huang · Bingshan Hu · jianping pan
Keywords:
Sleeping Arms
Combinatorial Bandits
Gaussian Priors
Multi-armed Bandits
Online Learning
Semi-bandit Feedback
thompson sampling
Abstract
This paper provides theoretical analyses of problem-independent upper and lower regret bounds for Gaussian randomized algorithms in semi-bandits with sleeping arms, where arms may be unavailable in certain rounds, and available arms satisfying combinatorial constraints can be played simultaneously. We first introduce the CTS-G algorithm, an adaptation of Thompson sampling with Gaussian priors, achieving an upper bound of $\tilde{O}(m\sqrt{NT})$ over $T$ rounds with $N$ arms and up to $m$ arms played per round, where $\tilde{O}$ hides the logarithmic factors. Next, we present CL-SG, which improves upon CTS-G by using a single Gaussian sample per round, achieving a near-optimal upper regret bound of $\tilde{O}(\sqrt{mNT})$. We also establish that both algorithms have lower regret bounds of $\Omega(\sqrt{mNT \ln \frac{N}{m}})$ and $\Omega(\sqrt{mNT})$, respectively.
Chat is not available.
Successful Page Load