Timezone: »

Multiagent Q-learning with Sub-Team Coordination
Wenhan Huang · Kai Li · Kun Shao · Tianze Zhou · Matthew Taylor · Jun Luo · Dongge Wang · Hangyu Mao · Jianye Hao · Jun Wang · Xiaotie Deng

Thu Dec 08 09:00 AM -- 11:00 AM (PST) @

In many real-world cooperative multiagent reinforcement learning (MARL) tasks, teams of agents can rehearse together before deployment, but then communication constraints may force individual agents to execute independently when deployed. Centralized training and decentralized execution (CTDE) is increasingly popular in recent years, focusing mainly on this setting. In the value-based MARL branch, credit assignment mechanism is typically used to factorize the team reward into each individual’s reward — individual-global-max (IGM) is a condition on the factorization ensuring that agents’ action choices coincide with team’s optimal joint action. However, current architectures fail to consider local coordination within sub-teams that should be exploited for more effective factorization, leading to faster learning. We propose a novel value factorization framework, called multiagent Q-learning with sub-team coordination (QSCAN), to flexibly represent sub-team coordination while honoring the IGM condition. QSCAN encompasses the full spectrum of sub-team coordination according to sub-team size, ranging from the monotonic value function class to the entire IGM function class, with familiar methods such as QMIX and QPLEX located at the respective extremes of the spectrum. Experimental results show that QSCAN’s performance dominates state-of-the-art methods in matrix games, predator-prey tasks, the Switch challenge in MA-Gym. Additionally, QSCAN achieves comparable performances to those methods in a selection of StarCraft II micro-management tasks.

Author Information

Wenhan Huang (Shanghai Jiao Tong University)
Kai Li (Huawei Noah's Ark Lab)
Kun Shao (Huawei Noah's Ark Lab)
Tianze Zhou (Beijing Institute of Technology)
Matthew Taylor (U. of Alberta)
Jun Luo (Huawei Technologies Ltd.)
Dongge Wang (Swiss Federal Institute of Technology Lausanne)
Hangyu Mao (Huawei Technologies Co., Ltd.)
Jianye Hao (Tianjin University)
Jun Wang (UCL)
Xiaotie Deng (Peking University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors