Better Best of Both Worlds Bounds for Bandits with Switching Costs

Idan Amir · Guy Azov · Tomer Koren · Roi Livni

Hall J #819

Keywords: [ Optimization ] [ Online Learning ] [ Multi-armed Bandits ] [ decision making ]

[ Abstract ]
[ Paper [ Slides [ Poster [ OpenReview
Tue 29 Nov 9 a.m. PST — 11 a.m. PST

Abstract: We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer et al., 2021. We introduce a surprisingly simple and effective algorithm that simultaneously achieves minimax optimal regret bound (up to logarithmic factors) of $\mathcal{O}(T^{2/3})$ in the oblivious adversarial setting and a bound of $\mathcal{O}(\min\{\log (T)/\Delta^2,T^{2/3}\})$ in the stochastically-constrained regime, both with (unit) switching costs, where $\Delta$ is the gap between the arms. In the stochastically constrained case, our bound improves over previous results due to Rouyer et al., 2021, that achieved regret of $\mathcal{O}(T^{1/3}/\Delta)$. We accompany our results with a lower bound showing that, in general, $\tilde{\mathcal{\Omega}}(\min\{1/\Delta^2,T^{2/3}\})$ switching cost regret is unavoidable in the stochastically-constrained case for algorithms with $\mathcal{O}(T^{2/3})$ worst-case switching cost regret.

Chat is not available.