Timezone: »

Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement
Chao Yang · Xiaojian Ma · Wenbing Huang · Fuchun Sun · Huaping Liu · Junzhou Huang · Chuang Gan

Tue Dec 10 05:30 PM -- 07:30 PM (PST) @ East Exhibition Hall B + C #205

This paper studies Learning from Observations (LfO) for imitation learning with access to state-only demonstrations. In contrast to Learning from Demonstration (LfD) that involves both action and state supervisions, LfO is more practical in leveraging previously inapplicable resources (e.g., videos), yet more challenging due to the incomplete expert guidance. In this paper, we investigate LfO and its difference with LfD in both theoretical and practical perspectives. We first prove that the gap between LfD and LfO actually lies in the disagreement of inverse dynamics models between the imitator and expert, if following the modeling approach of GAIL. More importantly, the upper bound of this gap is revealed by a negative causal entropy which can be minimized in a model-free way. We term our method as Inverse-Dynamics-Disagreement-Minimization (IDDM) which enhances the conventional LfO method through further bridging the gap to LfD. Considerable empirical results on challenging benchmarks indicate that our method attains consistent improvements over other LfO counterparts.

Author Information

Chao Yang (Tsinghua University)
Xiaojian Ma (Tsinghua University)
Wenbing Huang (Tsinghua University)
Fuchun Sun (Tsinghua)
Huaping Liu (Tsinghua University)
Junzhou Huang (University of Texas at Arlington / Tencent AI Lab)
Chuang Gan (MIT-IBM Watson AI Lab)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors