Timezone: »

Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning
Xin Zhang · Zhuqing Liu · Jia Liu · Zhengyuan Zhu · Songtao Lu

Wed Dec 08 04:30 PM -- 06:00 PM (PST) @
Cooperative multi-agent reinforcement learning (MARL) has received increasing attention in recent years and has found many scientific and engineering applications. However, a key challenge arising from many cooperative MARL algorithm designs (e.g., the actor-critic framework) is the policy evaluation problem, which can only be conducted in a {\em decentralized} fashion. In this paper, we focus on decentralized MARL policy evaluation with nonlinear function approximation, which is often seen in deep MARL. We first show that the empirical decentralized MARL policy evaluation problem can be reformulated as a decentralized nonconvex-strongly-concave minimax saddle point problem. We then develop a decentralized gradient-based descent ascent algorithm called GT-GDA that enjoys a convergence rate of $\mathcal{O}(1/T)$. To further reduce the sample complexity, we propose two decentralized stochastic optimization algorithms called GT-SRVR and GT-SRVRI, which enhance GT-GDA by variance reduction techniques. We show that all algorithms all enjoy an $\mathcal{O}(1/T)$ convergence rate to a stationary point of the reformulated minimax problem. Moreover, the fast convergence rates of GT-SRVR and GT-SRVRI imply $\mathcal{O}(\epsilon^{-2})$ communication complexity and $\mathcal{O}(m\sqrt{n}\epsilon^{-2})$ sample complexity, where $m$ is the number of agents and $n$ is the length of trajectories. To our knowledge, this paper is the first work that achieves both $\mathcal{O}(\epsilon^{-2})$ sample complexity and $\mathcal{O}(\epsilon^{-2})$ communication complexity in decentralized policy evaluation for cooperative MARL. Our extensive experiments also corroborate the theoretical performance of our proposed decentralized policy evaluation algorithms.

Author Information

Xin Zhang (Facebook)
Zhuqing Liu (Ohio State University)
Jia Liu (The Ohio State University)
Jia Liu

Jia (Kevin) Liu is an Assistant Professor in the Dept. of Electrical and Computer Engineering at The Ohio State University and an Amazon Visiting Academics (AVA). He received his Ph.D. degree from the Dept. of Electrical and Computer Engineering at Virginia Tech in 2010. From Aug. 2017 to Aug. 2020, he was an Assistant Professor in the Dept. of Computer Science at Iowa State University. His research areas include theoretical machine learning, stochastic network optimization and control, and performance analysis for data analytics infrastructure and cyber-physical systems. Dr. Liu is a senior member of IEEE and a member of ACM. He has received numerous awards at top venues, including IEEE INFOCOM'19 Best Paper Award, IEEE INFOCOM'16 Best Paper Award, IEEE INFOCOM'13 Best Paper Runner-up Award, IEEE INFOCOM'11 Best Paper Runner-up Award, IEEE ICC'08 Best Paper Award, and honors of long/spotlight presentations at ICML, NeurIPS, and ICLR. He is an NSF CAREER Award recipient in 2020 and a winner of the Google Faculty Research Award in 2020. He received the LAS Award for Early Achievement in Research at Iowa State University in 2020, and the Bell Labs President Gold Award. His research is supported by NSF, AFOSR, AFRL, and ONR.

Zhengyuan Zhu (Iowa State University)
Songtao Lu (IBM)

More from the Same Authors