Poster
Cornering Stationary and Restless Mixing Bandits with Remix-UCB
Julien Audiffren · Liva Ralaivola
210 C #94
[
Abstract
]
Abstract:
We study the restless bandit problem where arms are associated with stationary -mixing processes and where rewards are therefore dependent: the question that arises from this setting is that of carefully recovering some independence by `ignoring' the values of some rewards. As we shall see, the bandit problem we tackle requires us to address the exploration/exploitation/independence trade-off, which we do by considering the idea of a {\em waiting arm} in the new Remix-UCB algorithm, a generalization of Improved-UCB for the problem at hand, that we introduce. We provide a regret analysis for this bandit strategy; two noticeable features of Remix-UCB are that i) it reduces to the regular Improved-UCB when the -mixing coefficients are all , i.e. when the i.i.d scenario is recovered, and ii) when , it is able to ensure a controlled regret of order where encodes the distance between the best arm and the best suboptimal arm, even in the case when , i.e. the case when the -mixing coefficients {\em are not} summable.
Live content is unavailable. Log in and register to view live content