Timezone: »
We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum Markov games. We focus on the practical but challenging setting of decentralized MARL, where agents make decisions without coordination by a centralized controller, but only based on their own payoffs and local actions executed. The agents need not observe the opponent's actions or payoffs, possibly being even oblivious to the presence of the opponent, nor be aware of the zero-sum structure of the underlying game, a setting also referred to as radically uncoupled in the literature of learning in games. In this paper, we develop a radically uncoupled Q-learning dynamics that is both rational and convergent: the learning dynamics converges to the best response to the opponent's strategy when the opponent follows an asymptotically stationary strategy; when both agents adopt the learning dynamics, they converge to the Nash equilibrium of the game. The key challenge in this decentralized setting is the non-stationarity of the environment from an agent's perspective, since both her own payoffs and the system evolution depend on the actions of other agents, and each agent adapts her policies simultaneously and independently. To address this issue, we develop a two-timescale learning dynamics where each agent updates her local Q-function and value function estimates concurrently, with the latter happening at a slower timescale.
Author Information
Muhammed Sayin (Bilkent University)
Kaiqing Zhang (MIT)
David Leslie (Lancaster University)
Tamer Basar (University of Illinois at Urbana-Champaign)
Asuman Ozdaglar (Massachusetts Institute of Technology)
Asu Ozdaglar received the B.S. degree in electrical engineering from the Middle East Technical University, Ankara, Turkey, in 1996, and the S.M. and the Ph.D. degrees in electrical engineering and computer science from the Massachusetts Institute of Technology, Cambridge, in 1998 and 2003, respectively. She is currently a professor in the Electrical Engineering and Computer Science Department at the Massachusetts Institute of Technology. She is also the director of the Laboratory for Information and Decision Systems. Her research expertise includes optimization theory, with emphasis on nonlinear programming and convex analysis, game theory, with applications in communication, social, and economic networks, distributed optimization and control, and network analysis with special emphasis on contagious processes, systemic risk and dynamic control. Professor Ozdaglar is the recipient of a Microsoft fellowship, the MIT Graduate Student Council Teaching award, the NSF Career award, the 2008 Donald P. Eckman award of the American Automatic Control Council, the Class of 1943 Career Development Chair, the inaugural Steven and Renee Innovation Fellowship, and the 2014 Spira teaching award. She served on the Board of Governors of the Control System Society in 2010 and was an associate editor for IEEE Transactions on Automatic Control. She is currently the area co-editor for a new area for the journal Operations Research, entitled "Games, Information and Networks. She is the co-author of the book entitled âConvex Analysis and Optimizationâ (Athena Scientific, 2003).
More from the Same Authors
-
2022 : Smoothed-SGDmax: A Stability-Inspired Algorithm to Improve Adversarial Generalization »
Jiancong Xiao · Jiawei Zhang · Zhiquan Luo · Asuman Ozdaglar -
2022 Spotlight: A Mean-Field Game Approach to Cloud Resource Management with Function Approximation »
Weichao Mao · Haoran Qiu · Chen Wang · Hubertus Franke · Zbigniew Kalbarczyk · Ravishankar Iyer · Tamer Basar -
2022 Poster: What is a Good Metric to Study Generalization of Minimax Learners? »
Asuman Ozdaglar · Sarath Pattathil · Jiawei Zhang · Kaiqing Zhang -
2022 Poster: Bridging Central and Local Differential Privacy in Data Acquisition Mechanisms »
Alireza Fallah · Ali Makhdoumi · azarakhsh malekian · Asuman Ozdaglar -
2022 Poster: A Mean-Field Game Approach to Cloud Resource Management with Function Approximation »
Weichao Mao · Haoran Qiu · Chen Wang · Hubertus Franke · Zbigniew Kalbarczyk · Ravishankar Iyer · Tamer Basar -
2021 : Q&A with Professor Asu Ozdaglar »
Asuman Ozdaglar -
2021 : Keynote Talk: Personalization in Federated Learning: Adaptation and Clustering (Asu Ozdaglar) »
Asuman Ozdaglar -
2021 Poster: Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity »
Kaiqing Zhang · Xiangyuan Zhang · Bin Hu · Tamer Basar -
2021 Poster: Generalization of Model-Agnostic Meta-Learning Algorithms: Recurring and Unseen Tasks »
Alireza Fallah · Aryan Mokhtari · Asuman Ozdaglar -
2021 Poster: On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning »
Alireza Fallah · Kristian Georgiev · Aryan Mokhtari · Asuman Ozdaglar -
2020 Poster: BOSS: Bayesian Optimization over String Spaces »
Henry Moss · David Leslie · Daniel Beck · Javier González · Paul Rayson -
2020 Poster: Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach »
Alireza Fallah · Aryan Mokhtari · Asuman Ozdaglar -
2020 Spotlight: BOSS: Bayesian Optimization over String Spaces »
Henry Moss · David Leslie · Daniel Beck · Javier González · Paul Rayson -
2020 Poster: An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods »
Yanli Liu · Kaiqing Zhang · Tamer Basar · Wotao Yin -
2020 Poster: POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis »
Weichao Mao · Kaiqing Zhang · Qiaomin Xie · Tamer Basar -
2020 Poster: On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems »
Kaiqing Zhang · Bin Hu · Tamer Basar -
2020 Poster: Robust Multi-Agent Reinforcement Learning with Model Uncertainty »
Kaiqing Zhang · TAO SUN · Yunzhe Tao · Sahika Genc · Sunil Mallya · Tamer Basar -
2020 Poster: Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes »
Dongsheng Ding · Kaiqing Zhang · Tamer Basar · Mihailo Jovanovic -
2020 Poster: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2020 Spotlight: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: A Universally Optimal Multistage Accelerated Stochastic Gradient Method »
Necdet Serhat Aybat · Alireza Fallah · Mert Gurbuzbalaban · Asuman Ozdaglar -
2019 Poster: Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games »
Kaiqing Zhang · Zhuoran Yang · Tamer Basar -
2019 Poster: Non-Cooperative Inverse Reinforcement Learning »
Xiangyuan Zhang · Kaiqing Zhang · Erik Miehling · Tamer Basar -
2018 Poster: Escaping Saddle Points in Constrained Optimization »
Aryan Mokhtari · Asuman Ozdaglar · Ali Jadbabaie -
2018 Poster: Bandit Learning in Concave N-Person Games »
Mario Bravo · David Leslie · Panayotis Mertikopoulos -
2018 Spotlight: Escaping Saddle Points in Constrained Optimization »
Aryan Mokhtari · Asuman Ozdaglar · Ali Jadbabaie -
2017 Poster: When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent »
Mert Gurbuzbalaban · Asuman Ozdaglar · Pablo A Parrilo · Nuri Vanli -
2017 Spotlight: When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent »
Mert Gurbuzbalaban · Asuman Ozdaglar · Pablo A Parrilo · Nuri Vanli -
2015 Invited Talk: Incremental Methods for Additive Cost Convex Optimization »
Asuman Ozdaglar -
2013 Poster: Computing the Stationary Distribution Locally »
Christina Lee · Asuman Ozdaglar · Devavrat Shah