Poster
Better Best of Both Worlds Bounds for Bandits with Switching Costs
Idan Amir · Guy Azov · Tomer Koren · Roi Livni
Hall J (level 1) #819
Keywords: [ decision making ] [ Multi-armed Bandits ] [ Online Learning ] [ Optimization ]
Abstract:
We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer et al., 2021. We introduce a surprisingly simple and effective algorithm that simultaneously achieves minimax optimal regret bound (up to logarithmic factors) of O(T2/3) in the oblivious adversarial setting and a bound of O(min{log(T)/Δ2,T2/3}) in the stochastically-constrained regime, both with (unit) switching costs, where Δ is the gap between the arms. In the stochastically constrained case, our bound improves over previous results due to Rouyer et al., 2021, that achieved regret of O(T1/3/Δ). We accompany our results with a lower bound showing that, in general, ˜Ω(min{1/Δ2,T2/3}) switching cost regret is unavoidable in the stochastically-constrained case for algorithms with O(T2/3) worst-case switching cost regret.
Chat is not available.