Processing math: 100%
Skip to yearly menu bar Skip to main content


Poster

Better Best of Both Worlds Bounds for Bandits with Switching Costs

Idan Amir · Guy Azov · Tomer Koren · Roi Livni

Hall J (level 1) #819

Keywords: [ decision making ] [ Multi-armed Bandits ] [ Online Learning ] [ Optimization ]


Abstract: We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer et al., 2021. We introduce a surprisingly simple and effective algorithm that simultaneously achieves minimax optimal regret bound (up to logarithmic factors) of O(T2/3) in the oblivious adversarial setting and a bound of O(min{log(T)/Δ2,T2/3}) in the stochastically-constrained regime, both with (unit) switching costs, where Δ is the gap between the arms. In the stochastically constrained case, our bound improves over previous results due to Rouyer et al., 2021, that achieved regret of O(T1/3/Δ). We accompany our results with a lower bound showing that, in general, ˜Ω(min{1/Δ2,T2/3}) switching cost regret is unavoidable in the stochastically-constrained case for algorithms with O(T2/3) worst-case switching cost regret.

Chat is not available.