Timezone: »
Poster
A Reduction from Apprenticeship Learning to Classification
Umar Syed · Robert E Schapire
We provide new theoretical results for apprenticeship learning, a variant of reinforcement learning in which the true reward function is unknown, and the goal is to perform well relative to an observed expert. We study a common approach to learning from expert demonstrations: using a classification algorithm to learn to imitate the expert's behavior. Although this straightforward learning strategy is widely-used in practice, it has been subject to very little formal analysis. We prove that, if the learned classifier has error rate $\eps$, the difference between the value of the apprentice's policy and the expert's policy is $O(\sqrt{\eps})$. Further, we prove that this difference is only $O(\eps)$ when the expert's policy is close to optimal. This latter result has an important practical consequence: Not only does imitating a near-optimal expert result in a better policy, but far fewer demonstrations are required to successfully imitate such an expert. This suggests an opportunity for substantial savings whenever the expert is known to be good, but demonstrations are expensive or difficult to obtain.
Author Information
Umar Syed (University of Pennsylvania)
Robert E Schapire (MIcrosoft Research)
Robert Schapire received his ScB in math and computer science from Brown University in 1986, and his SM (1988) and PhD (1991) from MIT under the supervision of Ronald Rivest. After a short post-doc at Harvard, he joined the technical staff at AT&T Labs (formerly AT&T Bell Laboratories) in 1991 where he remained for eleven years. At the end of 2002, he became a Professor of Computer Science at Princeton University. His awards include the 1991 ACM Doctoral Dissertation Award, the 2003 Gödel Prize and the 2004 Kanelakkis Theory and Practice Award (both of the last two with Yoav Freund). His main research interest is in theoretical and applied machine learning.
More from the Same Authors
-
2014 Poster: A Drifting-Games Analysis for Online Learning and Applications to Boosting »
Haipeng Luo · Robert E Schapire -
2010 Oral: Semi-Supervised Learning with Adversarially Missing Label Information »
Umar Syed · Ben Taskar -
2010 Oral: A Theory of Multiclass Boosting »
Indraneel Mukherjee · Robert E Schapire -
2010 Poster: Semi-Supervised Learning with Adversarially Missing Label Information »
Umar Syed · Ben Taskar -
2010 Poster: A Theory of Multiclass Boosting »
Indraneel Mukherjee · Robert E Schapire -
2010 Poster: Non-Stochastic Bandit Slate Problems »
Satyen Kale · Lev Reyzin · Robert E Schapire -
2009 Poster: Adapting to the Shifting Intent of Search Queries »
Umar Syed · Aleksandrs Slivkins · Nina Mishra -
2007 Oral: A Multiplicative Weights Algorithm for Apprenticeship Learning »
Umar Syed · Robert E Schapire -
2007 Oral: FilterBoost: Regression and Classification on Large Datasets »
Joseph K Bradley · Robert E Schapire -
2007 Poster: FilterBoost: Regression and Classification on Large Datasets »
Joseph K Bradley · Robert E Schapire -
2007 Poster: A Multiplicative Weights Algorithm for Apprenticeship Learning »
Umar Syed · Robert E Schapire -
2007 Tutorial: Theory and Applications of Boosting »
Robert E Schapire