Skip to yearly menu bar Skip to main content


Poster

Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

Julian Zimmert · Tor Lattimore

East Exhibition Hall B, C #12

Keywords: [ Bandit Algorithms ] [ Algorithms ] [ Information Theory ] [ Algorithms -> Online Learning; Theory ]


Abstract: The information-theoretic analysis by Russo and Van Roy [2014] in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings. In most applications there is a tantalising similarity to the classical analysis based on mirror descent. We make a formal connection, showing that the information-theoretic bounds in most applications are derived from existing techniques from online convex optimisation. Besides this, we improve best known regret guarantees for $k$-armed adversarial bandits, online linear optimisation on $\ell_p$-balls and bandits with graph feedback.

Live content is unavailable. Log in and register to view live content