Timezone: »

Offline Multi-Agent Reinforcement Learning with Knowledge Distillation
Wei-Cheng Tseng · Tsun-Hsuan Johnson Wang · Yen-Chen Lin · Phillip Isola

Tue Nov 29 02:00 PM -- 04:00 PM (PST) @ Hall J #110

We introduce an offline multi-agent reinforcement learning ( offline MARL) framework that utilizes previously collected data without additional online data collection. Our method reformulates offline MARL as a sequence modeling problem and thus builds on top of the simplicity and scalability of the Transformer architecture. In the fashion of centralized training and decentralized execution, we propose to first train a teacher policy as if the MARL dataset is generated by a single agent. After the teacher policy has identified and recombined the "good" behavior in the dataset, we create separate student policies and distill not only the teacher policy's features but also its structural relations among different agents' features to student policies. Despite its simplicity, the proposed method outperforms state-of-the-art model-free offline MARL baselines while being more robust to demonstration's quality on several environments.

Author Information

Wei-Cheng Tseng (University of Toronto)
Tsun-Hsuan Johnson Wang (Massachusetts Institute of Technology)
Yen-Chen Lin (National Tsing Hua University)
Phillip Isola (Massachusetts Institute of Technology)

More from the Same Authors