Timezone: »
Algorithms designed for single-agent reinforcement learning (RL) generally fail to converge to equilibria in two-player zero-sum (2p0s) games. On the other hand, game-theoretic algorithms for approximating Nash and regularized equilibria in 2p0s games are not typically competitive for RL and can be difficult to scale. As a result, algorithms for these two cases are generally developed and evaluated separately. In this work, we show that a single algorithm---a simple extension to mirror descent with proximal regularization that we call magnetic mirror descent (MMD)---can produce strong results in both settings, despite their fundamental differences. From a theoretical standpoint, we prove that MMD converges linearly to quantal response equilibria (i.e., entropy regularized Nash equilibria) in extensive-form games---this is the first time linear convergence has been proven for a first order solver. Moreover, applied as a tabular Nash equilibrium solver via self-play, we show empirically that MMD produces results competitive with CFR in both normal-form and extensive-form games---this is the first time that a standard RL algorithm has done so. Furthermore, for single-agent deep RL, on a small collection of Atari and Mujoco tasks, we show that MMD can produce results competitive with those of PPO. Lastly, for multi-agent deep RL, we show MMD can outperform NFSP in 3x3 Abrupt Dark Hex.
Author Information
Samuel Sokota (Carnegie Mellon University)
Ryan D'Orazio (Université de Montréal)
J. Zico Kolter (Carnegie Mellon University / Bosch Center for AI)
Zico Kolter is an Assistant Professor in the School of Computer Science at Carnegie Mellon University, and also serves as Chief Scientist of AI Research for the Bosch Center for Artificial Intelligence. His work focuses on the intersection of machine learning and optimization, with a large focus on developing more robust, explainable, and rigorous methods in deep learning. In addition, he has worked on a number of application areas, highlighted by work on sustainability and smart energy systems. He is the recipient of the DARPA Young Faculty Award, and best paper awards at KDD, IJCAI, and PESGM.
Nicolas Loizou (Johns Hopkins University)
Marc Lanctot (DeepMind)
Ioannis Mitliagkas (University of Montreal)
Noam Brown (FAIR)
Christian Kroer (Columbia University)
More from the Same Authors
-
2020 : An adversarially robust approach to security-constrained optimal power flow »
Neeraj Vijay Bedmutha · Priya Donti · J. Zico Kolter -
2021 : A Fine-Tuning Approach to Belief State Modeling »
Samuel Sokota · Hengyuan Hu · David Wu · Jakob Foerster · Noam Brown -
2022 : Neural Networks Efficiently Learn Low-Dimensional Representations with SGD »
Alireza Mousavi-Hosseini · Sejun Park · Manuela Girotti · Ioannis Mitliagkas · Murat Erdogdu -
2022 : ProxSkip for Stochastic Variational Inequalities: A Federated Learning Algorithm for Provable Communication Acceleration »
Siqi Zhang · Nicolas Loizou -
2022 : Clairvoyant Regret Minimization: Equivalence with Nemirovski’s Conceptual Prox Method and Extension to General Convex Games »
Gabriele Farina · Christian Kroer · Chung-Wei Lee · Haipeng Luo -
2022 : Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods »
Aleksandr Beznosikov · Eduard Gorbunov · Hugo Berard · Nicolas Loizou -
2022 : Performative Prediction with Neural Networks »
Mehrnaz Mofakhami · Ioannis Mitliagkas · Gauthier Gidel -
2022 : Generative Posterior Networks for Approximately Bayesian Epistemic Uncertainty Estimation »
Melrose Roderick · Felix Berkenkamp · Fatemeh Sheikholeslami · J. Zico Kolter -
2022 : Empirical Study on Optimizer Selection for Out-of-Distribution Generalization »
Hiroki Naganuma · Kartik Ahuja · Ioannis Mitliagkas · Shiro Takagi · Tetsuya Motokawa · Rio Yokota · Kohta Ishikawa · Ikuro Sato -
2022 : A Reproducible and Realistic Evaluation of Partial Domain Adaptation Methods »
Tiago Salvador · Kilian FATRAS · Ioannis Mitliagkas · Adam Oberman -
2022 : Denoised Smoothing with Sample Rejection for Robustifying Pretrained Classifiers »
Fatemeh Sheikholeslami · Wan-Yi Lin · Jan Hendrik Metzen · Huan Zhang · J. Zico Kolter -
2022 : Human-AI Coordination via Human-Regularized Search and Learning »
Hengyuan Hu · David Wu · Adam Lerer · Jakob Foerster · Noam Brown -
2022 : Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning »
Anton Bakhtin · David Wu · Adam Lerer · Jonathan Gray · Athul Jacob · Gabriele Farina · Alexander Miller · Noam Brown -
2022 : Uncertainty-Driven Exploration for Generalization in Reinforcement Learning »
Yiding Jiang · J. Zico Kolter · Roberta Raileanu -
2022 : Converging to Unexploitable Policies in Continuous Control Adversarial Games »
Maxwell Goldstein · Noam Brown -
2022 : Improving Adversarial Robustness via Joint Classification and Multiple Explicit Detection Classes »
Sina Baharlouei · Fatemeh Sheikholeslami · Meisam Razaviyayn · J. Zico Kolter -
2022 : ESCHER: ESCHEWING IMPORTANCE SAMPLING IN GAMES BY COMPUTING A HISTORY VALUE FUNCTION TO ESTIMATE REGRET »
Stephen McAleer · Gabriele Farina · Marc Lanctot · Tuomas Sandholm -
2022 Workshop: Trustworthy and Socially Responsible Machine Learning »
Huan Zhang · Linyi Li · Chaowei Xiao · J. Zico Kolter · Anima Anandkumar · Bo Li -
2022 : Zico Kolter, Adapt like you train: How optimization at training time affects model finetuning and adaptation »
J. Zico Kolter -
2022 : Poster Session 1 »
Andrew Lowy · Thomas Bonnier · Yiling Xie · Guy Kornowski · Simon Schug · Seungyub Han · Nicolas Loizou · xinwei zhang · Laurent Condat · Tabea E. Röber · Si Yi Meng · Marco Mondelli · Runlong Zhou · Eshaan Nichani · Adrian Goldwaser · Rudrajit Das · Kayhan Behdin · Atish Agarwala · Mukul Gagrani · Gary Cheng · Tian Li · Haoran Sun · Hossein Taheri · Allen Liu · Siqi Zhang · Dmitrii Avdiukhin · Bradley Brown · Miaolan Xie · Junhyung Lyle Kim · Sharan Vaswani · Xinmeng Huang · Ganesh Ramachandra Kini · Angela Yuan · Weiqiang Zheng · Jiajin Li -
2022 Poster: Nonstationary Dual Averaging and Online Fair Allocation »
Luofeng Liao · Yuan Gao · Christian Kroer -
2022 Poster: Characterizing Datapoints via Second-Split Forgetting »
Pratyush Maini · Saurabh Garg · Zachary Lipton · J. Zico Kolter -
2022 Poster: Learning Options via Compression »
Yiding Jiang · Evan Liu · Benjamin Eysenbach · J. Zico Kolter · Chelsea Finn -
2022 Poster: Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games »
Ioannis Anagnostides · Gabriele Farina · Christian Kroer · Chung-Wei Lee · Haipeng Luo · Tuomas Sandholm -
2022 Poster: Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound »
Charles Guille-Escuret · Adam Ibrahim · Baptiste Goujaud · Ioannis Mitliagkas -
2022 Poster: Off-Team Learning »
Brandon Cui · Hengyuan Hu · Andrei Lupu · Samuel Sokota · Jakob Foerster -
2022 Poster: Self-Explaining Deviations for Coordination »
Hengyuan Hu · Samuel Sokota · David Wu · Anton Bakhtin · Andrei Lupu · Brandon Cui · Jakob Foerster -
2022 Poster: Efficiently Computing Local Lipschitz Constants of Neural Networks via Bound Propagation »
Zhouxing Shi · Yihan Wang · Huan Zhang · J. Zico Kolter · Cho-Jui Hsieh -
2022 Poster: Test Time Adaptation via Conjugate Pseudo-labels »
Sachin Goyal · Mingjie Sun · Aditi Raghunathan · J. Zico Kolter -
2022 Poster: Optimal Efficiency-Envy Trade-Off via Optimal Transport »
Steven Yin · Christian Kroer -
2022 Poster: Deep Equilibrium Approaches to Diffusion Models »
Ashwini Pokle · Zhengyang Geng · J. Zico Kolter -
2022 Poster: Agreement-on-the-line: Predicting the Performance of Neural Networks under Distribution Shift »
Christina Baek · Yiding Jiang · Aditi Raghunathan · J. Zico Kolter -
2022 Poster: Near-Optimal No-Regret Learning Dynamics for General Convex Games »
Gabriele Farina · Ioannis Anagnostides · Haipeng Luo · Chung-Wei Lee · Christian Kroer · Tuomas Sandholm -
2022 Poster: General Cutting Planes for Bound-Propagation-Based Neural Network Verification »
Huan Zhang · Shiqi Wang · Kaidi Xu · Linyi Li · Bo Li · Suman Jana · Cho-Jui Hsieh · J. Zico Kolter -
2022 Poster: Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution »
Antonio Orvieto · Simon Lacoste-Julien · Nicolas Loizou -
2022 Poster: Path Independent Equilibrium Models Can Better Exploit Test-Time Computation »
Cem Anil · Ashwini Pokle · Kaiqu Liang · Johannes Treutlein · Yuhuai Wu · Shaojie Bai · J. Zico Kolter · Roger Grosse -
2022 Poster: The Pitfalls of Regularization in Off-Policy TD Learning »
Gaurav Manek · J. Zico Kolter -
2021 : Panel B: Safe Learning and Decision Making in Uncertain and Unstructured Environments »
Yisong Yue · J. Zico Kolter · Ivan Dario D Jimenez Rodriguez · Dragos Margineantu · Animesh Garg · Melissa Greeff -
2021 : Enforcing Robustness for Neural Network Policies »
J. Zico Kolter -
2021 Workshop: Cooperative AI »
Natasha Jaques · Edward Hughes · Jakob Foerster · Noam Brown · Kalesha Bullard · Charlotte Smith -
2021 Poster: Beta-CROWN: Efficient Bound Propagation with Per-neuron Split Constraints for Neural Network Robustness Verification »
Shiqi Wang · Huan Zhang · Kaidi Xu · Xue Lin · Suman Jana · Cho-Jui Hsieh · J. Zico Kolter -
2021 Poster: Joint inference and input optimization in equilibrium networks »
Swaminathan Gurumurthy · Shaojie Bai · Zachary Manchester · J. Zico Kolter -
2021 Poster: Last-iterate Convergence in Extensive-Form Games »
Chung-Wei Lee · Christian Kroer · Haipeng Luo -
2021 Poster: $(\textrm{Implicit})^2$: Implicit Layers for Implicit Representations »
Zhichun Huang · Shaojie Bai · J. Zico Kolter -
2021 Poster: Boosted CVaR Classification »
Runtian Zhai · Chen Dan · Arun Suggala · J. Zico Kolter · Pradeep Ravikumar -
2021 Poster: Online Market Equilibrium with Application to Fair Division »
Yuan Gao · Alex Peysakhovich · Christian Kroer -
2021 Poster: Conic Blackwell Algorithm: Parameter-Free Convex-Concave Saddle-Point Solving »
Julien Grand-Clément · Christian Kroer -
2021 Poster: Training Certifiably Robust Neural Networks with Efficient Local Lipschitz Bounds »
Yujia Huang · Huan Zhang · Yuanyuan Shi · J. Zico Kolter · Anima Anandkumar -
2021 Poster: Adversarially robust learning for security-constrained optimal power flow »
Priya Donti · Aayushya Agarwal · Neeraj Vijay Bedmutha · Larry Pileggi · J. Zico Kolter -
2021 Poster: Robustness between the worst and average case »
Leslie Rice · Anna Bair · Huan Zhang · J. Zico Kolter -
2021 Poster: Monte Carlo Tree Search With Iteratively Refining State Abstractions »
Samuel Sokota · Caleb Y Ho · Zaheen Ahmad · J. Zico Kolter -
2020 : Invited Talk (Zico Kolter) »
J. Zico Kolter -
2020 Workshop: Machine Learning for Engineering Modeling, Simulation and Design »
Alex Beatson · Priya Donti · Amira Abdel-Rahman · Stephan Hoyer · Rose Yu · J. Zico Kolter · Ryan Adams -
2020 : Keynote by Zico Kolter »
J. Zico Kolter -
2020 Poster: Community detection using fast low-cardinality semidefinite programming
»
Po-Wei Wang · J. Zico Kolter -
2020 Poster: Deep Archimedean Copulas »
Chun Kai Ling · Fei Fang · J. Zico Kolter -
2020 Tutorial: (Track3) Deep Implicit Layers: Neural ODEs, Equilibrium Models, and Differentiable Optimization Q&A »
David Duvenaud · J. Zico Kolter · Matthew Johnson -
2020 Poster: Evaluating and Rewarding Teamwork Using Cooperative Game Abstractions »
Tom Yan · Christian Kroer · Alexander Peysakhovich -
2020 Poster: First-Order Methods for Large-Scale Market Equilibrium Computation »
Yuan Gao · Christian Kroer -
2020 Poster: Efficient semidefinite-programming-based inference for binary and multi-class MRFs »
Chirag Pabbaraju · Po-Wei Wang · J. Zico Kolter -
2020 Spotlight: Efficient semidefinite-programming-based inference for binary and multi-class MRFs »
Chirag Pabbaraju · Po-Wei Wang · J. Zico Kolter -
2020 Poster: Multiscale Deep Equilibrium Models »
Shaojie Bai · Vladlen Koltun · J. Zico Kolter -
2020 Poster: Denoised Smoothing: A Provable Defense for Pretrained Classifiers »
Hadi Salman · Mingjie Sun · Greg Yang · Ashish Kapoor · J. Zico Kolter -
2020 Poster: Monotone operator equilibrium networks »
Ezra Winston · J. Zico Kolter -
2020 Spotlight: Monotone operator equilibrium networks »
Ezra Winston · J. Zico Kolter -
2020 Oral: Multiscale Deep Equilibrium Models »
Shaojie Bai · Vladlen Koltun · J. Zico Kolter -
2020 Tutorial: (Track3) Deep Implicit Layers: Neural ODEs, Equilibrium Models, and Differentiable Optimization »
David Duvenaud · J. Zico Kolter · Matthew Johnson -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 Workshop: Bridging Game Theory and Deep Learning »
Ioannis Mitliagkas · Gauthier Gidel · Niao He · Reyhane Askari Hemmat · N H · Nika Haghtalab · Simon Lacoste-Julien -
2019 Poster: Learning Stable Deep Dynamics Models »
J. Zico Kolter · Gaurav Manek -
2019 Poster: Adversarial Music: Real world Audio Adversary against Wake-word Detection System »
Juncheng Li · Shuhui Qu · Xinjian Li · Joseph Szurley · J. Zico Kolter · Florian Metze -
2019 Spotlight: Adversarial Music: Real world Audio Adversary against Wake-word Detection System »
Juncheng Li · Shuhui Qu · Xinjian Li · Joseph Szurley · J. Zico Kolter · Florian Metze -
2019 Poster: Differentiable Convex Optimization Layers »
Akshay Agrawal · Brandon Amos · Shane Barratt · Stephen Boyd · Steven Diamond · J. Zico Kolter -
2019 Poster: Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions »
Gabriele Farina · Christian Kroer · Tuomas Sandholm -
2019 Poster: Robust Multi-agent Counterfactual Prediction »
Alexander Peysakhovich · Christian Kroer · Adam Lerer -
2019 Poster: Reducing the variance in online optimization by transporting past gradients »
Sébastien Arnold · Pierre-Antoine Manzagol · Reza Babanezhad Harikandeh · Ioannis Mitliagkas · Nicolas Le Roux -
2019 Spotlight: Reducing the variance in online optimization by transporting past gradients »
Sébastien Arnold · Pierre-Antoine Manzagol · Reza Babanezhad Harikandeh · Ioannis Mitliagkas · Nicolas Le Roux -
2019 Poster: Uniform convergence may be unable to explain generalization in deep learning »
Vaishnavh Nagarajan · J. Zico Kolter -
2019 Poster: Deep Equilibrium Models »
Shaojie Bai · J. Zico Kolter · Vladlen Koltun -
2019 Spotlight: Deep Equilibrium Models »
Shaojie Bai · J. Zico Kolter · Vladlen Koltun -
2019 Oral: Uniform convergence may be unable to explain generalization in deep learning »
Vaishnavh Nagarajan · J. Zico Kolter -
2018 : Talk 1: Zico Kolter - Differentiable Physics and Control »
J. Zico Kolter -
2018 Poster: Differentiable MPC for End-to-end Planning and Control »
Brandon Amos · Ivan Jimenez · Jacob I Sacks · Byron Boots · J. Zico Kolter -
2018 Poster: End-to-End Differentiable Physics for Learning and Control »
Filipe de Avila Belbute Peres · Kevin Smith · Kelsey Allen · Josh Tenenbaum · J. Zico Kolter -
2018 Spotlight: End-to-End Differentiable Physics for Learning and Control »
Filipe de Avila Belbute Peres · Kevin Smith · Kelsey Allen · Josh Tenenbaum · J. Zico Kolter -
2018 Poster: Scaling provable adversarial defenses »
Eric Wong · Frank Schmidt · Jan Hendrik Metzen · J. Zico Kolter -
2018 Tutorial: Adversarial Robustness: Theory and Practice »
J. Zico Kolter · Aleksander Madry -
2017 : Provable defenses against adversarial examples via the convex outer adversarial polytope »
J. Zico Kolter -
2017 Poster: A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning »
Marc Lanctot · Vinicius Zambaldi · Audrunas Gruslys · Angeliki Lazaridou · Karl Tuyls · Julien Perolat · David Silver · Thore Graepel -
2017 Poster: Gradient descent GAN optimization is locally stable »
Vaishnavh Nagarajan · J. Zico Kolter -
2017 Oral: Gradient descent GAN optimization is locally stable »
Vaishnavh Nagarajan · J. Zico Kolter -
2017 Demonstration: Libratus: Beating Top Humans in No-Limit Poker »
Noam Brown · Tuomas Sandholm -
2017 Poster: Safe and Nested Subgame Solving for Imperfect-Information Games »
Noam Brown · Tuomas Sandholm -
2017 Oral: Safe and Nested Subgame Solving for Imperfect-Information Games »
Noam Brown · Tuomas Sandholm -
2017 Poster: Task-based End-to-end Model Learning in Stochastic Optimization »
Priya Donti · J. Zico Kolter · Brandon Amos -
2016 Poster: The Multiple Quantile Graphical Model »
Alnur Ali · J. Zico Kolter · Ryan Tibshirani -
2013 Workshop: Machine Learning for Sustainability »
Edwin Bonilla · Thomas Dietterich · Theodoros Damoulas · Andreas Krause · Daniel Sheldon · Iadine Chades · J. Zico Kolter · Bistra Dilkina · Carla Gomes · Hugo P Simao -
2011 Workshop: Machine Learning for Sustainability »
Thomas Dietterich · J. Zico Kolter · Matthew A Brown -
2011 Poster: The Fixed Points of Off-Policy TD »
J. Zico Kolter -
2011 Spotlight: The Fixed Points of Off-Policy TD »
J. Zico Kolter -
2010 Poster: Energy Disaggregation via Discriminative Sparse Coding »
J. Zico Kolter · Siddarth Batra · Andrew Y Ng -
2009 Mini Symposium: Machine Learning for Sustainability »
J. Zico Kolter · Thomas Dietterich · Andrew Y Ng -
2007 Spotlight: Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion »
J. Zico Kolter · Pieter Abbeel · Andrew Y Ng -
2007 Poster: Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion »
J. Zico Kolter · Pieter Abbeel · Andrew Y Ng