Timezone: »

 
Planning Immediate Landmarks of Targets for Model-Free Skill Transfer across Agents
Minghuan Liu · Zhengbang Zhu · Menghui Zhu · Yuzheng Zhuang · Weinan Zhang · Jianye Hao
Event URL: https://openreview.net/forum?id=_h0fHl0VwF3 »

In reinforcement learning applications, agents usually need to deal with various input/output features when specified with different state and action spaces by their developers or physical restrictions, indicating re-training from scratch and considerable sample inefficiency, especially when agents follow similar solution steps to achieve tasks.In this paper, we aim to transfer pre-trained skills to alleviate the above challenge. Specifically, we propose PILoT, i.e., Planning Immediate Landmarks of Targets. PILoT utilizes the universal decoupled policy optimization to learn a goal-conditioned state planner; then, we distill a goal-planner to plan immediate landmarks in a model-free style that can be shared among different agents. In our experiments, we show the power of PILoT on various transferring challenges, including few-shot transferring across action spaces and dynamics, from low-dimensional vector states to image inputs, from simple robot to complicated morphology; and we also illustrate PILoT provides a zero-shot transfer solution from a simple 2D navigation task to the harder Ant-Maze task.

Author Information

Minghuan Liu (Shanghai Jiao Tong University)
Zhengbang Zhu (Sichuan University)
Menghui Zhu (Shanghai Jiao Tong University)
Yuzheng Zhuang (Huawei Technologies Co. Ltd.)
Weinan Zhang (Shanghai Jiao Tong University)
Jianye Hao (Tianjin University)

More from the Same Authors