Timezone: »
Reinforcement learning typically relies heavily on a well-designed reward signal, which gets more challenging in cooperative multi-agent reinforcement learning. Alternatively, unsupervised reinforcement learning (URL) has delivered on its promise in the recent past to learn useful skills and explore the environment without external supervised signals. These approaches mainly aimed for the single agent to reach distinguishable states, insufficient for multi-agent systems due to that each agent interacts with not only the environment, but also the other agents. We propose Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning (SPD) to learn generic coordination policies for agents with no extrinsic reward. Specifically, we devise the Synergy Pattern Graph (SPG), a graph depicting the relationships of agents at each time step. Furthermore, we propose an episode-wise divergence measurement to approximate the discrepancy of synergy patterns. To overcome the challenge of sparse return, we decompose the discrepancy of synergy patterns to per-time-step pseudo-reward. Empirically, we show the capacity of SPD to acquire meaningful coordination policies, such as maintaining specific formations in Multi-Agent Particle Environment and pass-and-shoot in Google Research Football. Furthermore, we demonstrate that the same instructive pretrained policy's parameters can serve as a good initialization for a series of downstream tasks' policies, achieving higher data efficiency and outperforming state-of-the-art approaches in Google Research Football.
Author Information
Yuhang Jiang (Department of Automation, Tsinghua University)
Jianzhun Shao (Tsinghua University)
Shuncheng He (Tsinghua University)
Hongchang Zhang (Tsinghua University)
Xiangyang Ji (Tsinghua University)
More from the Same Authors
-
2022 Poster: Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning »
Zihan Zhang · Yuhang Jiang · Yuan Zhou · Xiangyang Ji -
2022 Poster: Self-Organized Group for Cooperative Multi-agent Reinforcement Learning »
Jianzhun Shao · Zhiqiang Lou · Hongchang Zhang · Yuhang Jiang · Shuncheng He · Xiangyang Ji -
2022 : An Empirical Study on Distribution Shift Robustness From the Perspective of Pre-Training and Data Augmentation »
Ziquan Liu · Yi Xu · Yuanhong Xu · Qi Qian · Hao Li · Rong Jin · Xiangyang Ji · Antoni Chan -
2023 Poster: Supported Value Regularization for Offline Reinforcement Learning »
Yixiu Mao · Hongchang Zhang · Chen Chen · Yi Xu · Xiangyang Ji -
2023 Poster: DDF-HO: Hand-Held Object Reconstruction via Conditional Directed Distance Field »
Chenyangguang Zhang · Yan Di · Ruida Zhang · Guangyao Zhai · Fabian Manhardt · Federico Tombari · Xiangyang Ji -
2023 Poster: Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning »
Jianzhun Shao · yun qu · Chen Chen · Hongchang Zhang · Xiangyang Ji -
2023 Poster: Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks »
yun qu · Boyuan Wang · Jianzhun Shao · Yuhang Jiang · Chen Chen · Zhenbin Ye · Liu Linc · Yang Feng · Lin Lai · Hongyang Qin · Minwen Deng · Juchao Zhuo · Deheng Ye · Qiang Fu · YANG GUANG · Wei Yang · Lanxiao Huang · Xiangyang Ji -
2022 Spotlight: Lightning Talks 5A-3 »
Minting Pan · Xiang Chen · Wenhan Huang · Can Chang · Zhecheng Yuan · Jianzhun Shao · Yushi Cao · Peihao Chen · Ke Xue · Zhengrong Xue · Zhiqiang Lou · Xiangming Zhu · Lei Li · Zhiming Li · Kai Li · Jiacheng Xu · Dongyu Ji · Ni Mu · Kun Shao · Tianpei Yang · Kunyang Lin · Ningyu Zhang · Yunbo Wang · Lei Yuan · Bo Yuan · Hongchang Zhang · Jiajun Wu · Tianze Zhou · Xueqian Wang · Ling Pan · Yuhang Jiang · Xiaokang Yang · Xiaozhuan Liang · Hao Zhang · Weiwen Hu · Miqing Li · YAN ZHENG · Matthew Taylor · Huazhe Xu · Shumin Deng · Chao Qian · YI WU · Shuncheng He · Wenbing Huang · Chuanqi Tan · Zongzhang Zhang · Yang Gao · Jun Luo · Yi Li · Xiangyang Ji · Thomas Li · Mingkui Tan · Fei Huang · Yang Yu · Huazhe Xu · Dongge Wang · Jianye Hao · Chuang Gan · Yang Liu · Luo Si · Hangyu Mao · Huajun Chen · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Spotlight: Self-Organized Group for Cooperative Multi-agent Reinforcement Learning »
Jianzhun Shao · Zhiqiang Lou · Hongchang Zhang · Yuhang Jiang · Shuncheng He · Xiangyang Ji -
2022 Spotlight: Lightning Talks 2A-3 »
David Buterez · Chengan He · Xuan Kan · Yutong Lin · Konstantin Schürholt · Yu Yang · Louis Annabi · Wei Dai · Xiaotian Cheng · Alexandre Pitti · Ze Liu · Jon Paul Janet · Jun Saito · Boris Knyazev · Mathias Quoy · Zheng Zhang · James Zachary · Steven J Kiddle · Xavier Giro-i-Nieto · Chang Liu · Hejie Cui · Zilong Zhang · Hakan Bilen · Damian Borth · Dino Oglic · Holly Rushmeier · Han Hu · Xiangyang Ji · Yi Zhou · Nanning Zheng · Ying Guo · Pietro Liò · Stephen Lin · Carl Yang · Yue Cao -
2022 Spotlight: Distilling Representations from GAN Generator via Squeeze and Span »
Yu Yang · Xiaotian Cheng · Chang Liu · Hakan Bilen · Xiangyang Ji -
2022 Spotlight: Lightning Talks 1B-4 »
Andrei Atanov · Shiqi Yang · Wanshan Li · Yongchang Hao · Ziquan Liu · Jiaxin Shi · Anton Plaksin · Jiaxiang Chen · Ziqi Pan · yaxing wang · Yuxin Liu · Stepan Martyanov · Alessandro Rinaldo · Yuhao Zhou · Li Niu · Qingyuan Yang · Andrei Filatov · Yi Xu · Liqing Zhang · Lili Mou · Ruomin Huang · Teresa Yeo · kai wang · Daren Wang · Jessica Hwang · Yuanhong Xu · Qi Qian · Hu Ding · Michalis Titsias · Shangling Jui · Ajay Sohmshetty · Lester Mackey · Joost van de Weijer · Hao Li · Amir Zamir · Xiangyang Ji · Antoni Chan · Rong Jin -
2022 Spotlight: Improved Fine-Tuning by Better Leveraging Pre-Training Data »
Ziquan Liu · Yi Xu · Yuanhong Xu · Qi Qian · Hao Li · Xiangyang Ji · Antoni Chan · Rong Jin -
2022 Poster: Distilling Representations from GAN Generator via Squeeze and Span »
Yu Yang · Xiaotian Cheng · Chang Liu · Hakan Bilen · Xiangyang Ji -
2022 Poster: Improved Fine-Tuning by Better Leveraging Pre-Training Data »
Ziquan Liu · Yi Xu · Yuanhong Xu · Qi Qian · Hao Li · Xiangyang Ji · Antoni Chan · Rong Jin -
2021 Poster: Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP »
Zihan Zhang · Jiaqi Yang · Xiangyang Ji · Simon Du -
2021 Poster: TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification »
Zhuchen Shao · Hao Bian · Yang Chen · Yifeng Wang · Jian Zhang · Xiangyang Ji · yongbing zhang -
2020 Poster: Almost Optimal Model-Free Reinforcement Learningvia Reference-Advantage Decomposition »
Zihan Zhang · Yuan Zhou · Xiangyang Ji -
2019 Poster: Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function »
Zihan Zhang · Xiangyang Ji