Timezone: »
Trading exploration and exploitation plays a key role in a number of learning tasks. For example the bandit problem provides perhaps the simplest case in which we must decide a trade-off between pulling the arm that appears most advantageous and experimenting with arms for which we do not have accurate information. Similar issues arise in learning problems where the information received depends on the choices made by the learner. Learning studies have frequently concentrated on the final performance of the learned system rather than consider the errors made during the learning process. For example reinforcement learning has traditionally been concerned with showing convergence to an optimal policy, while in contrast analysis of the bandit problem has attempted to bound the extra loss experienced during the learning process when compared with an a priori optimal agent. This workshop provides a focus for work concerned with on-line trading of exploration and exploitation, in particular providing a forum for extensions to the bandit problem, invited presentations by researchers working in related areas in other disciplines, as well as discussion and contributed papers.
Author Information
Peter Auer (Montanuniversitaet Leoben)
More from the Same Authors
-
2014 Workshop: Autonomously Learning Robots »
Gerhard Neumann · Joelle Pineau · Peter Auer · Marc Toussaint -
2011 Poster: PAC-Bayesian Analysis of Contextual Bandits »
Yevgeny Seldin · Peter Auer · Francois Laviolette · John Shawe-Taylor · Ronald Ortner -
2008 Poster: Near-optimal Regret Bounds for Reinforcement Learning »
Peter Auer · Thomas Jaksch · Ronald Ortner -
2008 Spotlight: Near-optimal Regret Bounds for Reinforcement Learning »
Peter Auer · Thomas Jaksch · Ronald Ortner -
2006 Poster: Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning »
Peter Auer · Ronald Ortner