Timezone: »
Fully decentralized multi-agent reinforcement learning has shown great potentials for many real-world cooperative tasks, where the global information, \textit{e.g.}, the actions of other agents, is not accessible. Although independent Q-learning is widely used for decentralized training, the transition probabilities are non-stationary since other agents are updating policies simultaneously, which leads to non-guaranteed convergence of independent Q-learning. To deal with non-stationarity, we first introduce stationary ideal transition probabilities, on which independent Q-learning could converge to the global optimum. Further, we propose a fully decentralized method, I2Q, which performs independent Q-learning on the modeled ideal transition function to reach the global optimum. The modeling of ideal transition function in I2Q is fully decentralized and independent from the learned policies of other agents, helping I2Q be free from non-stationarity and learn the optimal policy. Empirically, we show that I2Q can achieve remarkable improvement in a variety of cooperative multi-agent tasks.
Author Information
Jiechuan Jiang (Peking University)
Zongqing Lu (Peking University)
More from the Same Authors
-
2022 Poster: Model-Based Opponent Modeling »
XiaoPeng Yu · Jiechuan Jiang · Wanpeng Zhang · Haobin Jiang · Zongqing Lu -
2022 Poster: Learning to Share in Networked Multi-Agent Reinforcement Learning »
Yuxuan Yi · Ge Li · Yaowei Wang · Zongqing Lu -
2022 Poster: Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination »
Jiafei Lyu · Xiu Li · Zongqing Lu -
2022 Poster: Mildly Conservative Q-Learning for Offline Reinforcement Learning »
Jiafei Lyu · Xiaoteng Ma · Xiu Li · Zongqing Lu -
2022 Poster: Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning »
Yuanpei Chen · Tianhao Wu · Shengjie Wang · Xidong Feng · Jiechuan Jiang · Zongqing Lu · Stephen McAleer · Hao Dong · Song-Chun Zhu · Yaodong Yang -
2022 : State Advantage Weighting for Offline RL »
Jiafei Lyu · aicheng Gong · Le Wan · Zongqing Lu · Xiu Li -
2022 Spotlight: Mildly Conservative Q-Learning for Offline Reinforcement Learning »
Jiafei Lyu · Xiaoteng Ma · Xiu Li · Zongqing Lu -
2022 Spotlight: Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination »
Jiafei Lyu · Xiu Li · Zongqing Lu -
2022 Spotlight: Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning »
Yuanpei Chen · Tianhao Wu · Shengjie Wang · Xidong Feng · Jiechuan Jiang · Zongqing Lu · Stephen McAleer · Hao Dong · Song-Chun Zhu · Yaodong Yang -
2020 Poster: Learning Individually Inferred Communication for Multi-Agent Cooperation »
gang Ding · Tiejun Huang · Zongqing Lu -
2020 Oral: Learning Individually Inferred Communication for Multi-Agent Cooperation »
gang Ding · Tiejun Huang · Zongqing Lu -
2019 Poster: Learning Fairness in Multi-Agent Systems »
Jiechuan Jiang · Zongqing Lu -
2018 Poster: Learning Attentional Communication for Multi-Agent Cooperation »
Jiechuan Jiang · Zongqing Lu