Timezone: »
The human ability to cooperate in a wide range of contexts is a key ingredient in the success of our species. Problems of cooperation—in which agents seek ways to jointly improve their welfare—are ubiquitous and important. They can be found at every scale, from the daily routines of highway driving, communicating in shared language and work collaborations, to the global challenges of climate change, pandemic preparedness and international trade. With AI agents playing an ever greater role in our lives, we must endow them with similar abilities. In particular they must understand the behaviors of others, find common ground by which to communicate with them, make credible commitments, and establish institutions which promote cooperative behavior. By construction, the goal of Cooperative AI is interdisciplinary in nature. Therefore, our workshop will bring together scholars from diverse backgrounds including reinforcement learning (and inverse RL), multi-agent systems, human-AI interaction, game theory, mechanism design, social choice, fairness, cognitive science, language learning, and interpretability. This year we will organize the workshop along two axes. First, we will discuss how to incentivize cooperation in AI systems, developing algorithms that can act effectively in general-sum settings, and which encourage others to cooperate. The second focus is on how to implement effective coordination, given that cooperation is already incentivized. For example, we may examine zero-shot coordination, in which AI agents need to coordinate with novel partners at test time. This setting is highly relevant to human-AI coordination, and provides a stepping stone for the community towards full Cooperative AI.
Tue 5:20 a.m. - 5:30 a.m.
|
Welcome and Opening Remarks
SlidesLive Video » |
Edward Hughes · Natasha Jaques 🔗 |
Tue 5:30 a.m. - 6:00 a.m.
|
Invited Talk: Bo An (Nanyang Technological University) on Learning to Coordinate in Complex Environments
(
Invited Talk
)
SlidesLive Video » |
Bo An 🔗 |
Tue 6:00 a.m. - 6:30 a.m.
|
Invited Talk: Michael Muthukrishna (London School of Economics) on Cultural Evolution and Human Cooperation
(
Invited Talk
)
SlidesLive Video » In the modern world, we cooperate with and live side by side with strangers, who often look, act, and speak in ways very different to us. We work together on goals with culturally distant nations that span the globe. I'm recording this talk, but I could have given it to you in person. That's unusual in many respects. It's unusual from a cross-species perspective - comparing us to our closest primate cousins, a room full of strange chimps is a room full of dead chimps. It's unusual from a historical perspective - even a few hundred years ago, a stranger in our midst was a potential threat. And it's unusual from a geographic perspective - even today some places are safer and more cooperative than others. Cooperation varies in scale, intensity, and domain - some countries cooperate on healthcare, others on defence. Compounding the puzzle, the evolutionary mechanisms that explain cooperation undermine one another and can stabilize non-cooperative or even maladaptive behavior. I'll discuss the latest discoveries in the science of cultural evolution and human cooperation and how these might apply to the development of cooperative AI. |
Michael Muthukrishna 🔗 |
Tue 6:30 a.m. - 7:00 a.m.
|
Invited Talk: Pablo Castro (Google Brain) on Estimating Policy Functions in Payment Systems using Reinforcement Learning
(
Invited Talk
)
SlidesLive Video » In this talk I will present some of our findings (in collaboration with the Bank of Canada) on using RL to approximate the policy rules of banks participating in a high-value payments system. The objective of the agents is to learn a policy function for the choice of amount of liquidity provided to the system at the beginning of the day. Individual choices have complex strategic effects precluding a closed form solution of the optimal policy, except in simple cases. We show that in a simplified two-agent setting, agents using reinforcement learning do learn the optimal policy that minimizes the cost of processing their individual payments. We also show that in more complex settings, both agents learn to reduce their liquidity costs. Our results show the applicability of RL to estimate best-response functions in real-world strategic games. |
Pablo Samuel Castro 🔗 |
Tue 7:00 a.m. - 7:15 a.m.
|
(Live) Q&A with Invited Speaker (Bo An)
(
Live Q&A
)
SlidesLive Video » |
🔗 |
Tue 7:15 a.m. - 7:30 a.m.
|
(Live) Q&A with Invited Speaker (Michael Muthukrishna)
(
Live Q&A
)
|
🔗 |
Tue 7:30 a.m. - 7:45 a.m.
|
(Live) Q&A with Invited Speaker (Pablo Castro)
(
Live Q&A
)
|
🔗 |
Tue 7:45 a.m. - 8:15 a.m.
|
Invited Talk: Ariel Procaccia (Harvard University) on Democracy and the Pursuit of Randomness
(
Invited Talk
)
SlidesLive Video » Sortition is a storied paradigm of democracy built on the idea of choosing representatives through lotteries instead of elections. In recent years this idea has found renewed popularity in the form of citizens’ assemblies, which bring together randomly selected people from all walks of life to discuss key questions and deliver policy recommendations. A principled approach to sortition, however, must resolve the tension between two competing requirements: that the demographic composition of citizens’ assemblies reflect the general population and that every person be given a fair chance (literally) to participate. I will describe our work on designing, analyzing and implementing randomized participant selection algorithms that balance these two requirements. I will also discuss practical challenges in sortition based on experience with the adoption and deployment of our open-source system, Panelot. |
Ariel Procaccia 🔗 |
Tue 8:15 a.m. - 8:45 a.m.
|
Invited Talk: Dorsa Sadigh (Stanford University) on The Role of Conventions in Adaptive Human-AI Interaction
(
Invited Talk
)
SlidesLive Video » Today I will be talking about the role of conventions in human-AI collaboration. Conventions are norms/equilibria we build through repeated interactions with each other. The idea of conventions has been well-studied in linguistics. We will start the talk by discussing the notion of linguistic conventions, and how we can build AI agents that can effectively build these conventions. We then extend the idea of linguistic conventions to conventions through actions. We discuss a modular approach to separate partner-specific conventions and rule-dependent representations. We then discuss how this can be done effectively when working with partners whose actions are high dimensional. Finally we extend the notion of conventions to larger scale systems beyond dyadic interactions. Specifically, we discuss what conventions/equilibria emerge in mixed-autonomy traffic networks and how that can be leveraged for better dynamic routing of vehicles. |
Dorsa Sadigh 🔗 |
Tue 8:45 a.m. - 9:15 a.m.
|
(Live) Invited Talk: Nika Haghtalab (UC Berkeley) on Collaborative Machine Learning: Training and Incentives
(
(Live) Invited Talk
)
SlidesLive Video » Many modern machine learning paradigms require large amounts of data and computation power that is rarely seen in one place or owned by one agent. In recent years, methods such as federated learning have been embraced as an approach for bringing about collaboration across learning agents. In practice, the success of these methods relies upon our ability to pool together the efforts of large numbers of individual learning agents, data set owners, and curators. In this talk, I will discuss how recruiting, serving, and retaining these agents requires us to address agents’ needs, limitations, and responsibilities. In particular, I will discuss two major questions in this field. First, how can we design collaborative learning mechanisms that benefit agents with heterogeneous learning objectives? Second, how can we ensure that the burden of data collection and learning is shared equitably between agents? |
Nika Haghtalab 🔗 |
Tue 9:15 a.m. - 9:30 a.m.
|
(Live) Q&A with Invited Speaker (Ariel Procaccia)
(
Live Q&A
)
|
🔗 |
Tue 9:30 a.m. - 9:45 a.m.
|
(Live) Q&A with Invited Speaker (Dorsa Sadigh)
(
Live Q&A
)
|
🔗 |
Tue 9:45 a.m. - 10:00 a.m.
|
(Live) Q&A with Invited Speaker (Nika Haghtalab)
(
Live Q&A
)
|
🔗 |
Tue 10:00 a.m. - 11:00 a.m.
|
Workshop Poster Session 1 (hosted in GatherTown) ( Poster Sessions (GatherTown) ) link » | 🔗 |
Tue 11:00 a.m. - 12:00 p.m.
|
Workshop Poster Session 2 (hosted in GatherTown) ( Poster Sessions (GatherTown) ) link » | 🔗 |
Tue 12:00 p.m. - 1:00 p.m.
|
(Live) Panel Discussion: Cooperative AI
(
Panel Discussion
)
SlidesLive Video » |
Kalesha Bullard · Allan Dafoe · Fei Fang · Chris Amato · Elizabeth M. Adams 🔗 |
Tue 1:00 p.m. - 1:15 p.m.
|
Spotlight Talk: Interactive Inverse Reinforcement Learning for Cooperative Games
(
Spotlight Talk
)
link »
SlidesLive Video » We study the problem of designing AI agents that cooperate effectively with a potentially suboptimal partner while having no access to the joint reward function. This problem is modeled as a cooperative episodic two-agent Markov Decision Process. We assume control over only the first of the two agents in a Stackelberg formulation of the game, where the second agent is acting so as to maximise expected utility given the first agent's policy. How should the first agent act so it can learn the joint reward function as quickly as possible, and so that the joint policy is as close to optimal as possible? In this paper, we analyse how knowledge about the reward function can be gained. We show that when the learning agent's policies have a significant effect on the transition function, the reward function can be learned efficiently. |
Thomas Kleine Büning · Anne-Marie George · Christos Dimitrakakis 🔗 |
Tue 1:15 p.m. - 1:30 p.m.
|
Spotlight Talk: Learning to solve complex tasks by growing knowledge culturally across generations
(
Spotlight Talk
)
link »
SlidesLive Video » Knowledge built culturally across generations allows humans to learn far more than an individual could glean from their own experience in a lifetime. Cultural knowledge in turn rests on language: language is the richest record of what previous generations believed, valued, and practiced, and how these evolved over time. The power and mechanisms of language as a means of cultural learning, however, are not well understood, and as a result, current AI systems do not leverage language as a means for cultural knowledge transmission. Here, we take a first step towards reverse-engineering cultural learning through language. We developed a suite of complex tasks in the form of minimalist-style video games, which we deployed in an iterated learning paradigm. Human participants were limited to only two attempts (two lives) to beat each game and were allowed to write a message to a future participant who read the message before playing. Knowledge accumulated gradually across generations, allowing later generations to advance further in the games and perform more efficient actions. Multigenerational learning followed a strikingly similar trajectory to individuals learning alone with an unlimited number of lives. These results suggest that language provides a sufficient medium to express and accumulate the knowledge people acquire in these diverse tasks: the dynamics of the environment, valuable goals, dangerous risks, and strategies for success. The video game paradigm we pioneer here is thus a rich test bed for developing AI systems capable of acquiring and transmitting cultural knowledge. |
Noah Goodman · Josh Tenenbaum · Michael Tessler · Jason Madeano 🔗 |
Tue 1:30 p.m. - 1:45 p.m.
|
Spotlight Talk: On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC)
(
Spotlight Talk
)
link »
SlidesLive Video »
Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning (MARL) problems. This work considers a collection of $N_{\mathrm{pop}}$ heterogeneous agents that can be segregated into $K$ classes such that the $k$-th class contains $N_k$ homogeneous agents. We aim to prove approximation guarantees of the MARL problem for this heterogeneous system by its corresponding MFC problem. We consider three scenarios where the reward and transition dynamics of all agents are respectively taken to be functions of $(1)$ joint state and action distributions across all classes, $(2)$ individual distributions of each class, and $(3)$ marginal distributions of the entire population. We show that, in these cases, the $K$-class MARL problem can be approximated by MFC with errors given as $e_1=\mathcal{O}(\frac{\sqrt{|\mathcal{X}||\mathcal{U}|}}{N_{\mathrm{pop}}}\sum_{k}\sqrt{N_k})$, $e_2=\mathcal{O}(\sqrt{|\mathcal{X}||\mathcal{U}|}\sum_{k}\frac{1}{\sqrt{N_k}})$ and $e_3=\mathcal{O}\left(\sqrt{|\mathcal{X}||\mathcal{U}|}\left[\frac{A}{N_{\mathrm{pop}}}\sum_{k\in[K]}\sqrt{N_k}+\frac{B}{\sqrt{N_{\mathrm{pop}}}}\right]\right)$, respectively, where $A, B$ are some constants and $|\mathcal{X}|,|\mathcal{U}|$ are the sizes of state and action spaces of each agent. Finally, we design a Natural Policy Gradient (NPG) based algorithm that, in the three cases stated above, can converge to an optimal MARL policy within $\mathcal{O}(e_j)$ error with a sample complexity of $\mathcal{O}(e_j^{-3})$, $j\in\{1,2,3\}$, respectively.
|
Mridul Agarwal · Vaneet Aggarwal · Washim Mondal · Satish Ukkusuri 🔗 |
Tue 1:45 p.m. - 2:00 p.m.
|
Spotlight Talk: Public Information Representation for Adversarial Team Games
(
Spotlight Talk
)
link »
SlidesLive Video » The study of sequential games in which a team plays against an adversary is receiving an increasing attention in the scientific literature.Their peculiarity resides in the asymmetric information available to the team members during the play which makes the equilibrium computation problem hard even with zero-sum payoffs. The algorithms available in the literature work with implicit representations of the strategy space and mainly resort to \textit{Linear Programming} and \emph{column generation} techniques. Such representations prevent from the adoption of standard tools for the generation of abstractions that previously demonstrated to be crucial when solving huge two-player zero-sum games. Differently from those works, we investigate the problem of designing a suitable game representation over which abstraction algorithms can work. In particular, our algorithms convert a sequential team-game with adversaries to a classical \textit{two-player zero-sum} game. In this converted game, the team is transformed into a single coordinator player which only knows information common to the whole team and prescribes to the players an action for any possible private state. Our conversion enables the adoption of highly scalable techniques already available for two-player zero-sum games, including techniques for generating automated abstractions. Because of the \textsf{NP}-hard nature of the problem, the resulting Public Team game may be exponentially larger than the original one. To limit this explosion, we design three pruning techniques that dramatically reduce the size of the tree. Finally, we show the effectiveness of the proposed approach by presenting experimental results on \textit{Kuhn} and \textit{Leduc Poker} games, obtained by applying state-of-art algorithms for two players zero-sum games on the converted games. |
Luca Carminati · Federico Cacciamani · Marco Ciccone · Nicola Gatti 🔗 |
Tue 2:00 p.m. - 2:15 p.m.
|
Closing Remarks
SlidesLive Video » |
Gillian Hadfield 🔗 |
Author Information
Natasha Jaques (UC Berkeley)
Edward Hughes (DeepMind)
Jakob Foerster (University of Oxford)
Jakob Foerster received a CIFAR AI chair in 2019 and is starting as an Assistant Professor at the University of Toronto and the Vector Institute in the academic year 20/21. During his PhD at the University of Oxford, he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind. He has since been working as a research scientist at Facebook AI Research in California, where he will continue advancing the field up to his move to Toronto. He was the lead organizer of the first Emergent Communication (EmeCom) workshop at NeurIPS in 2017, which he has helped organize ever since.
Noam Brown (Facebook AI Research)
Kalesha Bullard (DeepMind)
Charlotte Smith (DeepMind)
More from the Same Authors
-
2021 Spotlight: Collaborating with Humans without Human Data »
DJ Strouse · Kevin McKee · Matt Botvinick · Edward Hughes · Richard Everett -
2021 : Grounding Aleatoric Uncertainty in Unsupervised Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Andrei Lupu · Heinrich Kuttler · Edward Grefenstette · Tim Rocktäschel · Jakob Foerster -
2021 : No DICE: An Investigation of the Bias-Variance Tradeoff in Meta-Gradients »
Risto Vuorio · Jacob Beck · Greg Farquhar · Jakob Foerster · Shimon Whiteson -
2021 : That Escalated Quickly: Compounding Complexity by Editing Levels at the Frontier of Agent Capabilities »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2021 : A Fine-Tuning Approach to Belief State Modeling »
Samuel Sokota · Hengyuan Hu · David Wu · Jakob Foerster · Noam Brown -
2021 : Generalized Belief Learning in Multi-Agent Settings »
Darius Muglich · Luisa Zintgraf · Christian Schroeder de Witt · Shimon Whiteson · Jakob Foerster -
2022 : Adversarial Cheap Talk »
Chris Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2022 : A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games »
Samuel Sokota · Ryan D'Orazio · J. Zico Kolter · Nicolas Loizou · Marc Lanctot · Ioannis Mitliagkas · Noam Brown · Christian Kroer -
2022 : Human-AI Coordination via Human-Regularized Search and Learning »
Hengyuan Hu · David Wu · Adam Lerer · Jakob Foerster · Noam Brown -
2022 : Adversarial Cheap Talk »
Chris Lu · Timon Willi · Alistair Letcher · Jakob Foerster -
2022 : Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning »
Anton Bakhtin · David Wu · Adam Lerer · Jonathan Gray · Athul Jacob · Gabriele Farina · Alexander Miller · Noam Brown -
2022 : MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning »
Mikayel Samvelyan · Akbir Khan · Michael Dennis · Minqi Jiang · Jack Parker-Holder · Jakob Foerster · Roberta Raileanu · Tim Rocktäschel -
2022 : Converging to Unexploitable Policies in Continuous Control Adversarial Games »
Maxwell Goldstein · Noam Brown -
2022 : Jakob Foerster »
Jakob Foerster -
2022 Poster: Proximal Learning With Opponent-Learning Awareness »
Stephen Zhao · Chris Lu · Roger Grosse · Jakob Foerster -
2022 Poster: Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world »
Eugene Vinitsky · Nathan Lichtlé · Xiaomeng Yang · Brandon Amos · Jakob Foerster -
2022 Poster: Grounding Aleatoric Uncertainty for Unsupervised Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Andrei Lupu · Heinrich Küttler · Edward Grefenstette · Tim Rocktäschel · Jakob Foerster -
2022 Poster: Off-Team Learning »
Brandon Cui · Hengyuan Hu · Andrei Lupu · Samuel Sokota · Jakob Foerster -
2022 Poster: Self-Explaining Deviations for Coordination »
Hengyuan Hu · Samuel Sokota · David Wu · Anton Bakhtin · Andrei Lupu · Brandon Cui · Jakob Foerster -
2022 Poster: Discovered Policy Optimisation »
Chris Lu · Jakub Kuba · Alistair Letcher · Luke Metz · Christian Schroeder de Witt · Jakob Foerster -
2022 Poster: Influencing Long-Term Behavior in Multiagent Reinforcement Learning »
Dong-Ki Kim · Matthew Riemer · Miao Liu · Jakob Foerster · Michael Everett · Chuangchuang Sun · Gerald Tesauro · Jonathan How -
2022 Poster: Equivariant Networks for Zero-Shot Coordination »
Darius Muglich · Christian Schroeder de Witt · Elise van der Pol · Shimon Whiteson · Jakob Foerster -
2021 : V&S | Panel discussion »
Michael Dennis · Stuart J Russell · Mireille Hildebrandt · Salome Viljoen · Natasha Jaques -
2021 : Welcome and Opening Remarks »
Edward Hughes · Natasha Jaques -
2021 Poster: Collaborating with Humans without Human Data »
DJ Strouse · Kevin McKee · Matt Botvinick · Edward Hughes · Richard Everett -
2021 Poster: Replay-Guided Adversarial Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2021 Poster: K-level Reasoning for Zero-Shot Coordination in Hanabi »
Brandon Cui · Hengyuan Hu · Luis Pineda · Jakob Foerster -
2021 Poster: Neural Pseudo-Label Optimism for the Bank Loan Problem »
Aldo Pacchiano · Shaun Singh · Edward Chou · Alex Berg · Jakob Foerster -
2020 Workshop: Talking to Strangers: Zero-Shot Emergent Communication »
Marie Ossenkopf · Angelos Filos · Abhinav Gupta · Michael Noukhovitch · Angeliki Lazaridou · Jakob Foerster · Kalesha Bullard · Rahma Chaabouni · Eugene Kharitonov · Roberto Dessì -
2020 Poster: Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian »
Jack Parker-Holder · Luke Metz · Cinjon Resnick · Hengyuan Hu · Adam Lerer · Alistair Letcher · Alexander Peysakhovich · Aldo Pacchiano · Jakob Foerster -
2020 Poster: Learning to Incentivize Other Learning Agents »
Jiachen Yang · Ang Li · Mehrdad Farajtabar · Peter Sunehag · Edward Hughes · Hongyuan Zha -
2019 Workshop: Emergent Communication: Towards Natural Language »
Abhinav Gupta · Michael Noukhovitch · Cinjon Resnick · Natasha Jaques · Angelos Filos · Marie Ossenkopf · Angeliki Lazaridou · Jakob Foerster · Ryan Lowe · Douwe Kiela · Kyunghyun Cho -
2019 Poster: Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning »
Gregory Farquhar · Shimon Whiteson · Jakob Foerster -
2019 Poster: Multi-Agent Common Knowledge Reinforcement Learning »
Christian Schroeder de Witt · Jakob Foerster · Gregory Farquhar · Philip Torr · Wendelin Boehmer · Shimon Whiteson -
2018 Workshop: Emergent Communication Workshop »
Jakob Foerster · Angeliki Lazaridou · Ryan Lowe · Igor Mordatch · Douwe Kiela · Kyunghyun Cho -
2018 Poster: Inequity aversion improves cooperation in intertemporal social dilemmas »
Edward Hughes · Joel Leibo · Matthew Phillips · Karl Tuyls · Edgar Dueñez-Guzman · Antonio García Castañeda · Iain Dunning · Tina Zhu · Kevin McKee · Raphael Koster · Heather Roff · Thore Graepel -
2017 Workshop: Emergent Communication Workshop »
Jakob Foerster · Igor Mordatch · Angeliki Lazaridou · Kyunghyun Cho · Douwe Kiela · Pieter Abbeel -
2017 Demonstration: Libratus: Beating Top Humans in No-Limit Poker »
Noam Brown · Tuomas Sandholm -
2017 Poster: Safe and Nested Subgame Solving for Imperfect-Information Games »
Noam Brown · Tuomas Sandholm -
2017 Oral: Safe and Nested Subgame Solving for Imperfect-Information Games »
Noam Brown · Tuomas Sandholm -
2016 Poster: Learning to Communicate with Deep Multi-Agent Reinforcement Learning »
Jakob Foerster · Yannis Assael · Nando de Freitas · Shimon Whiteson