Timezone: »
We study the problem of an apprentice learning to behave in an environment with an unknown reward function by observing the behavior of an expert. We follow on the work of Abbeel and Ng who considered a framework in which the true reward function is assumed to be a linear combination of a set of known and observable features. We give a new algorithm that, like theirs, is guaranteed to learn a policy that is nearly as good as the expert's, given enough examples. However, unlike their algorithm, we show that ours may produce a policy that is substantially better than the expert's. Moreover, our algorithm is computationally much faster, is easy to implement, and can be applied even in the absence of an expert. The method is based on a game-theoretic view of the problem, which leads naturally to a direct application of the multiplicative-weights algorithm of Freund and Schapire for playing repeated matrix games. In addition to our formal presentation and analysis of the new algorithm, we sketch how the method can be applied when the MDP itself is unknown, and we provide an experimental demonstration of the algorithm on a toy video-game environment.
Author Information
Umar Syed (University of Pennsylvania)
Robert E Schapire (MIcrosoft Research)
Robert Schapire received his ScB in math and computer science from Brown University in 1986, and his SM (1988) and PhD (1991) from MIT under the supervision of Ronald Rivest. After a short post-doc at Harvard, he joined the technical staff at AT&T Labs (formerly AT&T Bell Laboratories) in 1991 where he remained for eleven years. At the end of 2002, he became a Professor of Computer Science at Princeton University. His awards include the 1991 ACM Doctoral Dissertation Award, the 2003 Gödel Prize and the 2004 Kanelakkis Theory and Practice Award (both of the last two with Yoav Freund). His main research interest is in theoretical and applied machine learning.
Related Events (a corresponding poster, oral, or spotlight)
-
2007 Poster: A Multiplicative Weights Algorithm for Apprenticeship Learning »
Tue. Dec 4th 06:30 -- 06:40 PM Room
More from the Same Authors
-
2014 Poster: A Drifting-Games Analysis for Online Learning and Applications to Boosting »
Haipeng Luo · Robert E Schapire -
2010 Poster: A Reduction from Apprenticeship Learning to Classification »
Umar Syed · Robert E Schapire -
2010 Oral: Semi-Supervised Learning with Adversarially Missing Label Information »
Umar Syed · Ben Taskar -
2010 Oral: A Theory of Multiclass Boosting »
Indraneel Mukherjee · Robert E Schapire -
2010 Poster: Semi-Supervised Learning with Adversarially Missing Label Information »
Umar Syed · Ben Taskar -
2010 Poster: A Theory of Multiclass Boosting »
Indraneel Mukherjee · Robert E Schapire -
2010 Poster: Non-Stochastic Bandit Slate Problems »
Satyen Kale · Lev Reyzin · Robert E Schapire -
2009 Poster: Adapting to the Shifting Intent of Search Queries »
Umar Syed · Aleksandrs Slivkins · Nina Mishra -
2007 Oral: FilterBoost: Regression and Classification on Large Datasets »
Joseph K Bradley · Robert E Schapire -
2007 Poster: FilterBoost: Regression and Classification on Large Datasets »
Joseph K Bradley · Robert E Schapire -
2007 Tutorial: Theory and Applications of Boosting »
Robert E Schapire