Timezone: »

Risk-Aversion in Multi-armed Bandits
Amir Sani · Alessandro Lazaric · Remi Munos

Tue Dec 04 07:00 PM -- 12:00 AM (PST) @ Harrah’s Special Events Center 2nd Floor

In stochastic multi--armed bandits the objective is to solve the exploration--exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk--aversion where the objective is to compete against the arm with the best risk--return trade--off. This setting proves to be intrinsically more difficult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we introduce two new algorithms, we investigate their theoretical guarantees, and we report preliminary empirical results.

Author Information

Amir Sani (Centre d'Economie de la Sorbonne, CNRS)
Alessandro Lazaric (Facebook Artificial Intelligence Research)
Remi Munos (Google DeepMind)

More from the Same Authors