Timezone: »

Parametric Bandits: The Generalized Linear Case
Sarah Filippi · Olivier Cappé · Aurélien Garivier · Csaba Szepesvari

Mon Dec 06 12:00 AM -- 12:00 AM (PST) @

We consider structured multi-armed bandit tasks in which the agent is guided by prior structural knowledge that can be exploited to efficiently select the optimal arm(s) in situations where the number of arms is large, or even infinite. We pro- pose a new optimistic, UCB-like, algorithm for non-linearly parameterized bandit problems using the Generalized Linear Model (GLM) framework. We analyze the regret of the proposed algorithm, termed GLM-UCB, obtaining results similar to those recently proved in the literature for the linear regression case. The analysis also highlights a key difficulty of the non-linear case which is solved in GLM-UCB by focusing on the reward space rather than on the parameter space. Moreover, as the actual efficiency of current parameterized bandit algorithms is often deceiving in practice, we provide an asymptotic argument leading to significantly faster convergence. Simulation studies on real data sets illustrate the performance and the robustness of the proposed GLM-UCB approach.

Author Information

Sarah Filippi (Imperial College London)
Olivier Cappé (CNRS)
Aurélien Garivier (LTCI -CNRS Télécom ParisTech)
Csaba Szepesvari (University of Alberta)

More from the Same Authors