Timezone: »

Weighted Linear Bandits for Non-Stationary Environments
Yoan Russac · Claire Vernade · Olivier Cappé

Tue Dec 10 05:30 PM -- 07:30 PM (PST) @ East Exhibition Hall B + C #23

We consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is allowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past. This involves studying the deviations of the sequential weighted least-squares estimator under generic assumptions. As a by-product, we obtain novel deviation results that can be used beyond non-stationary environments. We provide theoretical guarantees on the behavior of D-LinUCB in both slowly-varying and abruptly-changing environments. We obtain an upper bound on the dynamic regret that is of order d BT^{1/3}T^{2/3}, where BT is a measure of non-stationarity (d and T being, respectively, dimension and horizon). This rate is known to be optimal. We also illustrate the empirical performance of D-LinUCB and compare it with recently proposed alternatives in simulated environments.

Author Information

Yoan Russac (Ecole Normale Supérieure)
Claire Vernade (Google DeepMind)

Claire got her PhD from Telecom ParisTech (S2A team, Olivier Cappé) in October 2017 and she is now a post-doc at Amazon CoreAI in Berlin and at the University of Magdeburg, working with Alexandra Carpentier. Her work focuses on designing and analyzing bandit models for recommendation, A/B testing and other marketing-related applications. From a larger perspective, she is interested in modeling external sources of uncertainty -- or bias -- in order to understand the impact that it may have on the complexity of the learning and on the final result.

Olivier Cappé (CNRS)

More from the Same Authors