Timezone: »
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent policy gradient methods in competitive RL; prior work has largely focused on centralized, coordinated procedures for equilibrium computation.
Author Information
Constantinos Daskalakis (MIT)
Dylan Foster (MIT)
Noah Golowich (Massachusetts Institute of Technology)
More from the Same Authors
-
2021 Spotlight: Littlestone Classes are Privately Online Learnable »
Noah Golowich · Roi Livni -
2021 : Estimation of Standard Asymmetric Auction Models »
Yeshwanth Cherapanamjeri · Constantinos Daskalakis · Andrew Ilyas · Emmanouil Zampetakis -
2021 : Near-Optimal No-Regret Learning in General Games »
Constantinos Daskalakis · Maxwell Fishelson · Noah Golowich -
2021 : Estimation of Standard Asymmetric Auction Models »
Yeshwanth Cherapanamjeri · Constantinos Daskalakis · Andrew Ilyas · Emmanouil Zampetakis -
2021 : Near-Optimal No-Regret Learning in General Games »
Constantinos Daskalakis · Maxwell Fishelson · Noah Golowich -
2022 Poster: Learning in Observable POMDPs, without Computationally Intractable Oracles »
Noah Golowich · Ankur Moitra · Dhruv Rohatgi -
2022 Poster: Interaction-Grounded Learning with Action-Inclusive Feedback »
Tengyang Xie · Akanksha Saran · Dylan J Foster · Lekan Molu · Ida Momennejad · Nan Jiang · Paul Mineiro · John Langford -
2022 Poster: Understanding the Eluder Dimension »
Gene Li · Pritish Kamath · Dylan J Foster · Nati Srebro -
2022 Poster: On the Complexity of Adversarial Decision Making »
Dylan J Foster · Alexander Rakhlin · Ayush Sekhari · Karthik Sridharan -
2021 : Spotlight 4: Estimation of Standard Asymmetric Auction Models »
Yeshwanth Cherapanamjeri · Constantinos Daskalakis · Andrew Ilyas · Emmanouil Zampetakis -
2021 Poster: Deep Learning with Label Differential Privacy »
Badih Ghazi · Noah Golowich · Ravi Kumar · Pasin Manurangsi · Chiyuan Zhang -
2021 Poster: Near-Optimal No-Regret Learning in General Games »
Constantinos Daskalakis · Maxwell Fishelson · Noah Golowich -
2021 Poster: Littlestone Classes are Privately Online Learnable »
Noah Golowich · Roi Livni -
2021 Oral: Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination »
Dylan Foster · Akshay Krishnamurthy -
2021 Poster: Efficient Truncated Linear Regression with Unknown Noise Variance »
Constantinos Daskalakis · Patroklos Stefanou · Rui Yao · Emmanouil Zampetakis -
2021 Poster: Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination »
Dylan Foster · Akshay Krishnamurthy -
2021 Oral: Near-Optimal No-Regret Learning in General Games »
Constantinos Daskalakis · Maxwell Fishelson · Noah Golowich -
2020 Poster: Tight last-iterate convergence rates for no-regret learning in multi-player games »
Noah Golowich · Sarath Pattathil · Constantinos Daskalakis -
2020 Poster: Truncated Linear Regression in High Dimensions »
Constantinos Daskalakis · Dhruv Rohatgi · Emmanouil Zampetakis -
2020 Poster: Adapting to Misspecification in Contextual Bandits »
Dylan Foster · Claudio Gentile · Mehryar Mohri · Julian Zimmert -
2020 Poster: Constant-Expansion Suffices for Compressed Sensing with Generative Priors »
Constantinos Daskalakis · Dhruv Rohatgi · Emmanouil Zampetakis -
2020 Poster: Learning the Linear Quadratic Regulator from Nonlinear Observations »
Zakaria Mhammedi · Dylan Foster · Max Simchowitz · Dipendra Misra · Wen Sun · Akshay Krishnamurthy · Alexander Rakhlin · John Langford -
2020 Spotlight: Constant-Expansion Suffices for Compressed Sensing with Generative Priors »
Constantinos Daskalakis · Dhruv Rohatgi · Emmanouil Zampetakis -
2020 Session: Orals & Spotlights Track 11: Learning Theory »
Dylan Foster · Nicolò Cesa-Bianchi -
2020 : Real World RL with Vowpal Wabbit: Beyond Contextual Bandits »
John Langford · Marek Wydmuch · Maryam Majzoubi · Adith Swaminathan · · Dylan Foster · Paul Mineiro -
2019 : Poster Session »
Gergely Flamich · Shashanka Ubaru · Charles Zheng · Josip Djolonga · Kristoffer Wickstrøm · Diego Granziol · Konstantinos Pitas · Jun Li · Robert Williamson · Sangwoong Yoon · Kwot Sin Lee · Julian Zilly · Linda Petrini · Ian Fischer · Zhe Dong · Alexander Alemi · Bao-Ngoc Nguyen · Rob Brekelmans · Tailin Wu · Aditya Mahajan · Alexander Li · Kirankumar Shiragur · Yair Carmon · Linara Adilova · SHIYU LIU · Bang An · Sanjeeb Dash · Oktay Gunluk · Arya Mazumdar · Mehul Motani · Julia Rosenzweig · Michael Kamp · Marton Havasi · Leighton P Barnes · Zhengqing Zhou · Yi Hao · Dylan Foster · Yuval Benjamini · Nati Srebro · Michael Tschannen · Paul Rubenstein · Sylvain Gelly · John Duchi · Aaron Sidford · Robin Ru · Stefan Zohren · Murtaza Dalal · Michael A Osborne · Stephen J Roberts · Moses Charikar · Jayakumar Subramanian · Xiaodi Fan · Max Schwarzer · Nicholas Roberts · Simon Lacoste-Julien · Vinay Prabhu · Aram Galstyan · Greg Ver Steeg · Lalitha Sankar · Yung-Kyun Noh · Gautam Dasarathy · Frank Park · Ngai-Man (Man) Cheung · Ngoc-Trung Tran · Linxiao Yang · Ben Poole · Andrea Censi · Tristan Sylvain · R Devon Hjelm · Bangjie Liu · Jose Gallego-Posada · Tyler Sypherd · Kai Yang · Jan Nikolas Morshuis -
2019 Poster: Model Selection for Contextual Bandits »
Dylan Foster · Akshay Krishnamurthy · Haipeng Luo -
2019 Spotlight: Model Selection for Contextual Bandits »
Dylan Foster · Akshay Krishnamurthy · Haipeng Luo -
2019 Poster: Hypothesis Set Stability and Generalization »
Dylan Foster · Spencer Greenberg · Satyen Kale · Haipeng Luo · Mehryar Mohri · Karthik Sridharan -
2018 : Improving Generative Adversarial Networks using Game Theory and Statistics »
Constantinos Daskalakis -
2018 Poster: Contextual bandits with surrogate losses: Margin bounds and efficient algorithms »
Dylan Foster · Akshay Krishnamurthy -
2018 Poster: Learning and Testing Causal Models with Interventions »
Jayadev Acharya · Arnab Bhattacharyya · Constantinos Daskalakis · Saravanan Kandasamy -
2018 Poster: Smoothed Analysis of Discrete Tensor Decomposition and Assemblies of Neurons »
Nima Anari · Constantinos Daskalakis · Wolfgang Maass · Christos Papadimitriou · Amin Saberi · Santosh Vempala -
2018 Poster: HOGWILD!-Gibbs can be PanAccurate »
Constantinos Daskalakis · Nishanth Dikkala · Siddhartha Jayanti -
2018 Poster: Uniform Convergence of Gradients for Non-Convex Learning and Optimization »
Dylan Foster · Ayush Sekhari · Karthik Sridharan -
2018 Poster: The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization »
Constantinos Daskalakis · Ioannis Panageas -
2017 Poster: Spectrally-normalized margin bounds for neural networks »
Peter Bartlett · Dylan J Foster · Matus Telgarsky -
2017 Spotlight: Spectrally-normalized margin bounds for neural networks »
Peter Bartlett · Dylan J Foster · Matus Telgarsky -
2017 Poster: Parameter-Free Online Learning via Model Selection »
Dylan J Foster · Satyen Kale · Mehryar Mohri · Karthik Sridharan -
2017 Spotlight: Parameter-Free Online Learning via Model Selection »
Dylan J Foster · Satyen Kale · Mehryar Mohri · Karthik Sridharan -
2017 Poster: Concentration of Multilinear Functions of the Ising Model with Applications to Network Data »
Constantinos Daskalakis · Nishanth Dikkala · Gautam Kamath -
2016 Poster: Learning in Games: Robustness of Fast Convergence »
Dylan Foster · zhiyuan li · Thodoris Lykouris · Karthik Sridharan · Eva Tardos -
2015 : Discussion Panel »
Tim van Erven · Wouter Koolen · Peter Grünwald · Shai Ben-David · Dylan Foster · Satyen Kale · Gergely Neu -
2015 : Adaptive Online Learning »
Dylan Foster -
2015 Poster: Optimal Testing for Properties of Distributions »
Jayadev Acharya · Constantinos Daskalakis · Gautam Kamath -
2015 Poster: Adaptive Online Learning »
Dylan Foster · Alexander Rakhlin · Karthik Sridharan -
2015 Spotlight: Adaptive Online Learning »
Dylan Foster · Alexander Rakhlin · Karthik Sridharan -
2015 Spotlight: Optimal Testing for Properties of Distributions »
Jayadev Acharya · Constantinos Daskalakis · Gautam Kamath