Poster
Bandits with Feedback Graphs and Switching Costs
Raman Arora · Teodor Vanislavov Marinov · Mehryar Mohri
East Exhibition Hall B, C #48
Keywords: [ Online Learning ] [ Algorithms ] [ Bandit Algorithms ]
[
Abstract
]
Abstract:
We study the adversarial multi-armed bandit problem where the learner is supplied with partial observations modeled by a \emph{feedback graph} and where shifting to a new action incurs a fixed \emph{switching cost}. We give two new algorithms for this problem in the informed setting. Our best algorithm achieves a pseudo-regret of , where is the domination number of the feedback graph. This significantly improves upon the previous best result for the same problem, which was based on the independence number of . We also present matching lower bounds for our result that we describe in detail. Finally, we give a new algorithm with improved policy regret bounds when partial counterfactual feedback is available.
Live content is unavailable. Log in and register to view live content