Workshop: Offline Reinforcement Learning

MBAIL: Multi-Batch Best Action Imitation Learning utilizing Sample Transfer and Policy Distillation

Di Wu · · David Meger · Michael Jenkin · Steve Liu · Gregory Dudek


Most online reinforcement learning (RL) algorithms require a large number of interactions with the environment to learn a reliable control policy. Unfortunately, the assumption of the availability of repeated interactions with the environment does not hold for many real-world applications. Batch RL aims to learn a good control policy from a previously collected dataset without requiring additional interactions with the environment, which are very promising in solving real-world problems. However, in the real world, we may only have a limited amount of data points for certain tasks we are interested in. Also, most of the current batch RL methods are mainly aimed to learn policy over one fixed dataset with which it is hard to learn a policy that can perform well over multiple tasks. In this work, we propose to tackle these challenges with sample transfer and policy distillation. The proposed methods are evaluated on multiple control tasks to showcase their effectiveness.

Chat is not available.