Timezone: »
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is maximizing a utility function that is a linear combination of state-action features. Most IRL algorithms use a simple Monte Carlo estimation to approximate the expected feature counts under the expert's policy. In this paper, we show that the quality of the learned policies is highly sensitive to the error in estimating the feature counts. To reduce this error, we introduce a novel approach for bootstrapping the demonstration by assuming that: (i), the expert is (near-)optimal, and (ii), the dynamics of the system is known. Empirical results on gridworlds and car racing problems show that our approach is able to learn good policies from a small number of demonstrations.
Author Information
Abdeslam Boularias (Max Planck Institute for Intelligent Systems)
Brahim Chaib-draa (Laval University)
More from the Same Authors
-
2012 Poster: Gradient Weights help Nonparametric Regressors »
Samory Kpotufe · Abdeslam Boularias -
2012 Oral: Gradient Weights help Nonparametric Regressors »
Samory Kpotufe · Abdeslam Boularias -
2012 Poster: A Marginalized Particle Gaussian Process Regression »
Yali Wang · Brahim Chaib-draa -
2012 Poster: Algorithms for Learning Markov Field Policies »
Abdeslam Boularias · Oliver Kroemer · Jan Peters -
2007 Spotlight: Bayes-Adaptive POMDPs »
Stephane Ross · Brahim Chaib-draa · Joelle Pineau -
2007 Poster: Bayes-Adaptive POMDPs »
Stephane Ross · Brahim Chaib-draa · Joelle Pineau -
2007 Poster: Theoretical Analysis of Heuristic Search Methods for Online POMDPs »
Stephane Ross · Joelle Pineau · Brahim Chaib-draa