Skip to yearly menu bar Skip to main content

Workshop: Deep Reinforcement Learning

Self-Imitation Learning from Demonstrations

Georgiy Pshikhachev · Dmitry Ivanov · Vladimir Egorov · Aleksei Shpilman


Despite the numerous breakthroughs achieved with Reinforcement Learning (RL), Self-Imitation Learning from Demonstrationssolving environments with sparse rewards remains a challenging task that requires sophisticated exploration. Learning from Demonstrations (LfD) remedies this issue by guiding agent’s exploration towards states experienced by an expert. Naturally, the benefits of this approach hinge on the quality of demonstrations, which are rarely optimal in realistic scenarios. Modern LfD algorithms lack provable robustness to suboptimal demonstrations and introduce additional hyperparameters to control the influence of demonstrations. To address these issues, we extend Self-Imitation Learning (SIL), a recent RL algorithm that exploits agent’s past good experience, to the LfD setup by initializing its replay buffer with demonstrations. We denote our algorithm as SIL from Demonstrations (SILfD). Our theoretical analysis highlights that SILfD is safe to apply to demonstrations of any degree of suboptimality and automatically adjusts the influence of demonstrations throughout the training. Our empirical investigation shows the superiority of SIL over existing LfD algorithms in settings of suboptimal demonstrations and sparse rewards.

Chat is not available.