Timezone: »
We present Coordinated Proximal Policy Optimization (CoPPO), an algorithm that extends the original Proximal Policy Optimization (PPO) to the multi-agent setting. The key idea lies in the coordinated adaptation of step size during the policy update process among multiple agents. We prove the monotonicity of policy improvement when optimizing a theoretically-grounded joint objective, and derive a simplified optimization objective based on a set of approximations. We then interpret that such an objective in CoPPO can achieve dynamic credit assignment among agents, thereby alleviating the high variance issue during the concurrent update of agent policies. Finally, we demonstrate that CoPPO outperforms several strong baselines and is competitive with the latest multi-agent PPO method (i.e. MAPPO) under typical multi-agent settings, including cooperative matrix games and the StarCraft II micromanagement tasks.
Author Information
Zifan Wu (Sun Yat-sen University)
Chao Yu (Sun Yat-sen University)
Deheng Ye (Tencent)
Junge Zhang (CASIA)
haiyin piao (Northwestern Polytechnical University)
Hankz Hankui Zhuo (Sun Yat-sen University)
More from the Same Authors
-
2021 : TiKick: Toward Playing Multi-agent Football Full Games from Single-agent Demonstrations »
Shiyu Huang · Wenze Chen · Longfei Zhang · Shizhen Xu · Ziyang Li · Fengming Zhu · Deheng Ye · Ting Chen · Jun Zhu -
2022 Poster: Plan To Predict: Learning an Uncertainty-Foreseeing Model For Model-Based Reinforcement Learning »
Zifan Wu · Chao Yu · Chen Chen · Jianye Hao · Hankz Hankui Zhuo -
2022 Poster: A Unified Diversity Measure for Multiagent Reinforcement Learning »
Zongkai Liu · Chao Yu · Yaodong Yang · peng sun · Zifan Wu · Yuan Li -
2022 Spotlight: Plan To Predict: Learning an Uncertainty-Foreseeing Model For Model-Based Reinforcement Learning »
Zifan Wu · Chao Yu · Chen Chen · Jianye Hao · Hankz Hankui Zhuo -
2022 Poster: Distributional Reward Estimation for Effective Multi-agent Deep Reinforcement Learning »
Jifeng Hu · Yanchao Sun · Hechang Chen · Sili Huang · haiyin piao · Yi Chang · Lichao Sun -
2022 Poster: Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning »
Hua Wei · Jingxiao Chen · Xiyang Ji · Hongyang Qin · Minwen Deng · Siqin Li · Liang Wang · Weinan Zhang · Yong Yu · Liu Linc · Lanxiao Huang · Deheng Ye · Qiang Fu · Wei Yang -
2021 Poster: Learning Diverse Policies in MOBA Games via Macro-Goals »
Yiming Gao · Bei Shi · Xueying Du · Liang Wang · Guangwei Chen · Zhenjie Lian · Fuhao Qiu · GUOAN HAN · Weixuan Wang · Deheng Ye · Qiang Fu · Wei Yang · Lanxiao Huang -
2020 Poster: Towards Playing Full MOBA Games with Deep Reinforcement Learning »
Deheng Ye · Guibin Chen · Wen Zhang · Sheng Chen · Bo Yuan · Bo Liu · Jia Chen · Zhao Liu · Fuhao Qiu · Hongsheng Yu · Yinyuting Yin · Bei Shi · Liang Wang · Tengfei Shi · Qiang Fu · Wei Yang · Lanxiao Huang · Wei Liu -
2019 Poster: Transductive Zero-Shot Learning with Visual Structure Constraint »
Ziyu Wan · Dongdong Chen · Yan Li · Xingguang Yan · Junge Zhang · Yizhou Yu · Jing Liao -
2012 Poster: Action-Model Based Multi-agent Plan Recognition »
Hankz Hankui Zhuo · Qiang Yang · Subbarao Kambhampati