Skip to yearly menu bar Skip to main content


Poster

Cornering Stationary and Restless Mixing Bandits with Remix-UCB

Julien Audiffren · Liva Ralaivola

210 C #94

Abstract: We study the restless bandit problem where arms are associated with stationary φ-mixing processes and where rewards are therefore dependent: the question that arises from this setting is that of carefully recovering some independence by `ignoring' the values of some rewards. As we shall see, the bandit problem we tackle requires us to address the exploration/exploitation/independence trade-off, which we do by considering the idea of a {\em waiting arm} in the new Remix-UCB algorithm, a generalization of Improved-UCB for the problem at hand, that we introduce. We provide a regret analysis for this bandit strategy; two noticeable features of Remix-UCB are that i) it reduces to the regular Improved-UCB when the φ-mixing coefficients are all 0, i.e. when the i.i.d scenario is recovered, and ii) when φ(n)=O(nα), it is able to ensure a controlled regret of order \Ot(Δ(α2)/αlog1/αT), where Δ encodes the distance between the best arm and the best suboptimal arm, even in the case when α<1, i.e. the case when the φ-mixing coefficients {\em are not} summable.

Live content is unavailable. Log in and register to view live content