Workshop
Cooperative AI
Natasha Jaques · Edward Hughes · Jakob Foerster · Noam Brown · Kalesha Bullard · Charlotte Smith
The human ability to cooperate in a wide range of contexts is a key ingredient in the success of our species. Problems of cooperation—in which agents seek ways to jointly improve their welfare—are ubiquitous and important. They can be found at every scale, from the daily routines of highway driving, communicating in shared language and work collaborations, to the global challenges of climate change, pandemic preparedness and international trade. With AI agents playing an ever greater role in our lives, we must endow them with similar abilities. In particular they must understand the behaviors of others, find common ground by which to communicate with them, make credible commitments, and establish institutions which promote cooperative behavior. By construction, the goal of Cooperative AI is interdisciplinary in nature. Therefore, our workshop will bring together scholars from diverse backgrounds including reinforcement learning (and inverse RL), multiagent systems, humanAI interaction, game theory, mechanism design, social choice, fairness, cognitive science, language learning, and interpretability. This year we will organize the workshop along two axes. First, we will discuss how to incentivize cooperation in AI systems, developing algorithms that can act effectively in generalsum settings, and which encourage others to cooperate. The second focus is on how to implement effective coordination, given that cooperation is already incentivized. For example, we may examine zeroshot coordination, in which AI agents need to coordinate with novel partners at test time. This setting is highly relevant to humanAI coordination, and provides a stepping stone for the community towards full Cooperative AI.
Schedule
Tue 5:20 a.m.  5:30 a.m.

Welcome and Opening Remarks
(
Welcome and Opening Remarks
)
SlidesLive Video 
Edward Hughes · Natasha Jaques 🔗 
Tue 5:30 a.m.  6:00 a.m.

Invited Talk: Bo An (Nanyang Technological University) on Learning to Coordinate in Complex Environments
(
Invited Talk
)
SlidesLive Video 
Bo An 🔗 
Tue 6:00 a.m.  6:30 a.m.

Invited Talk: Michael Muthukrishna (London School of Economics) on Cultural Evolution and Human Cooperation
(
Invited Talk
)
SlidesLive Video In the modern world, we cooperate with and live side by side with strangers, who often look, act, and speak in ways very different to us. We work together on goals with culturally distant nations that span the globe. I'm recording this talk, but I could have given it to you in person. That's unusual in many respects. It's unusual from a crossspecies perspective  comparing us to our closest primate cousins, a room full of strange chimps is a room full of dead chimps. It's unusual from a historical perspective  even a few hundred years ago, a stranger in our midst was a potential threat. And it's unusual from a geographic perspective  even today some places are safer and more cooperative than others. Cooperation varies in scale, intensity, and domain  some countries cooperate on healthcare, others on defence. Compounding the puzzle, the evolutionary mechanisms that explain cooperation undermine one another and can stabilize noncooperative or even maladaptive behavior. I'll discuss the latest discoveries in the science of cultural evolution and human cooperation and how these might apply to the development of cooperative AI. 
Michael Muthukrishna 🔗 
Tue 6:30 a.m.  7:00 a.m.

Invited Talk: Pablo Castro (Google Brain) on Estimating Policy Functions in Payment Systems using Reinforcement Learning
(
Invited Talk
)
SlidesLive Video In this talk I will present some of our findings (in collaboration with the Bank of Canada) on using RL to approximate the policy rules of banks participating in a highvalue payments system. The objective of the agents is to learn a policy function for the choice of amount of liquidity provided to the system at the beginning of the day. Individual choices have complex strategic effects precluding a closed form solution of the optimal policy, except in simple cases. We show that in a simplified twoagent setting, agents using reinforcement learning do learn the optimal policy that minimizes the cost of processing their individual payments. We also show that in more complex settings, both agents learn to reduce their liquidity costs. Our results show the applicability of RL to estimate bestresponse functions in realworld strategic games. 
Pablo Samuel Castro 🔗 
Tue 7:00 a.m.  7:15 a.m.

(Live) Q&A with Invited Speaker (Bo An)
(
Live Q&A
)
SlidesLive Video 
🔗 
Tue 7:15 a.m.  7:30 a.m.

(Live) Q&A with Invited Speaker (Michael Muthukrishna)
(
Live Q&A
)

🔗 
Tue 7:30 a.m.  7:45 a.m.

(Live) Q&A with Invited Speaker (Pablo Castro)
(
Live Q&A
)

🔗 
Tue 7:45 a.m.  8:15 a.m.

Invited Talk: Ariel Procaccia (Harvard University) on Democracy and the Pursuit of Randomness
(
Invited Talk
)
SlidesLive Video Sortition is a storied paradigm of democracy built on the idea of choosing representatives through lotteries instead of elections. In recent years this idea has found renewed popularity in the form of citizens’ assemblies, which bring together randomly selected people from all walks of life to discuss key questions and deliver policy recommendations. A principled approach to sortition, however, must resolve the tension between two competing requirements: that the demographic composition of citizens’ assemblies reflect the general population and that every person be given a fair chance (literally) to participate. I will describe our work on designing, analyzing and implementing randomized participant selection algorithms that balance these two requirements. I will also discuss practical challenges in sortition based on experience with the adoption and deployment of our opensource system, Panelot. 
Ariel Procaccia 🔗 
Tue 8:15 a.m.  8:45 a.m.

Invited Talk: Dorsa Sadigh (Stanford University) on The Role of Conventions in Adaptive HumanAI Interaction
(
Invited Talk
)
SlidesLive Video Today I will be talking about the role of conventions in humanAI collaboration. Conventions are norms/equilibria we build through repeated interactions with each other. The idea of conventions has been wellstudied in linguistics. We will start the talk by discussing the notion of linguistic conventions, and how we can build AI agents that can effectively build these conventions. We then extend the idea of linguistic conventions to conventions through actions. We discuss a modular approach to separate partnerspecific conventions and ruledependent representations. We then discuss how this can be done effectively when working with partners whose actions are high dimensional. Finally we extend the notion of conventions to larger scale systems beyond dyadic interactions. Specifically, we discuss what conventions/equilibria emerge in mixedautonomy traffic networks and how that can be leveraged for better dynamic routing of vehicles. 
Dorsa Sadigh 🔗 
Tue 8:45 a.m.  9:15 a.m.

(Live) Invited Talk: Nika Haghtalab (UC Berkeley) on Collaborative Machine Learning: Training and Incentives
(
(Live) Invited Talk
)
SlidesLive Video Many modern machine learning paradigms require large amounts of data and computation power that is rarely seen in one place or owned by one agent. In recent years, methods such as federated learning have been embraced as an approach for bringing about collaboration across learning agents. In practice, the success of these methods relies upon our ability to pool together the efforts of large numbers of individual learning agents, data set owners, and curators. In this talk, I will discuss how recruiting, serving, and retaining these agents requires us to address agents’ needs, limitations, and responsibilities. In particular, I will discuss two major questions in this field. First, how can we design collaborative learning mechanisms that benefit agents with heterogeneous learning objectives? Second, how can we ensure that the burden of data collection and learning is shared equitably between agents? 
Nika Haghtalab 🔗 
Tue 9:15 a.m.  9:30 a.m.

(Live) Q&A with Invited Speaker (Ariel Procaccia)
(
Live Q&A
)

🔗 
Tue 9:30 a.m.  9:45 a.m.

(Live) Q&A with Invited Speaker (Dorsa Sadigh)
(
Live Q&A
)

🔗 
Tue 9:45 a.m.  10:00 a.m.

(Live) Q&A with Invited Speaker (Nika Haghtalab)
(
Live Q&A
)

🔗 
Tue 10:00 a.m.  11:00 a.m.

Workshop Poster Session 1 (hosted in GatherTown) ( Poster Sessions (GatherTown) ) link  🔗 
Tue 11:00 a.m.  12:00 p.m.

Workshop Poster Session 2 (hosted in GatherTown) ( Poster Sessions (GatherTown) ) link  🔗 
Tue 12:00 p.m.  1:00 p.m.

(Live) Panel Discussion: Cooperative AI
(
Panel Discussion
)
SlidesLive Video 
Kalesha Bullard · Allan Dafoe · Fei Fang · Chris Amato · Elizabeth M. Adams 🔗 
Tue 1:00 p.m.  1:15 p.m.

Spotlight Talk: Interactive Inverse Reinforcement Learning for Cooperative Games
(
Spotlight Talk
)
link
SlidesLive Video We study the problem of designing AI agents that cooperate effectively with a potentially suboptimal partner while having no access to the joint reward function. This problem is modeled as a cooperative episodic twoagent Markov Decision Process. We assume control over only the first of the two agents in a Stackelberg formulation of the game, where the second agent is acting so as to maximise expected utility given the first agent's policy. How should the first agent act so it can learn the joint reward function as quickly as possible, and so that the joint policy is as close to optimal as possible? In this paper, we analyse how knowledge about the reward function can be gained. We show that when the learning agent's policies have a significant effect on the transition function, the reward function can be learned efficiently. 
Thomas Kleine Büning · AnneMarie George · Christos Dimitrakakis 🔗 
Tue 1:15 p.m.  1:30 p.m.

Spotlight Talk: Learning to solve complex tasks by growing knowledge culturally across generations
(
Spotlight Talk
)
link
SlidesLive Video Knowledge built culturally across generations allows humans to learn far more than an individual could glean from their own experience in a lifetime. Cultural knowledge in turn rests on language: language is the richest record of what previous generations believed, valued, and practiced, and how these evolved over time. The power and mechanisms of language as a means of cultural learning, however, are not well understood, and as a result, current AI systems do not leverage language as a means for cultural knowledge transmission. Here, we take a first step towards reverseengineering cultural learning through language. We developed a suite of complex tasks in the form of minimaliststyle video games, which we deployed in an iterated learning paradigm. Human participants were limited to only two attempts (two lives) to beat each game and were allowed to write a message to a future participant who read the message before playing. Knowledge accumulated gradually across generations, allowing later generations to advance further in the games and perform more efficient actions. Multigenerational learning followed a strikingly similar trajectory to individuals learning alone with an unlimited number of lives. These results suggest that language provides a sufficient medium to express and accumulate the knowledge people acquire in these diverse tasks: the dynamics of the environment, valuable goals, dangerous risks, and strategies for success. The video game paradigm we pioneer here is thus a rich test bed for developing AI systems capable of acquiring and transmitting cultural knowledge. 
Noah Goodman · Josh Tenenbaum · Michael Tessler · Jason Madeano 🔗 
Tue 1:30 p.m.  1:45 p.m.

Spotlight Talk: On the Approximation of Cooperative Heterogeneous MultiAgent Reinforcement Learning (MARL) using Mean Field Control (MFC)
(
Spotlight Talk
)
link
SlidesLive Video
Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multiagent reinforcement learning (MARL) problems. This work considers a collection of $N_{\mathrm{pop}}$ heterogeneous agents that can be segregated into $K$ classes such that the $k$th class contains $N_k$ homogeneous agents. We aim to prove approximation guarantees of the MARL problem for this heterogeneous system by its corresponding MFC problem. We consider three scenarios where the reward and transition dynamics of all agents are respectively taken to be functions of $(1)$ joint state and action distributions across all classes, $(2)$ individual distributions of each class, and $(3)$ marginal distributions of the entire population. We show that, in these cases, the $K$class MARL problem can be approximated by MFC with errors given as $e_1=\mathcal{O}(\frac{\sqrt{\mathcal{X}\mathcal{U}}}{N_{\mathrm{pop}}}\sum_{k}\sqrt{N_k})$, $e_2=\mathcal{O}(\sqrt{\mathcal{X}\mathcal{U}}\sum_{k}\frac{1}{\sqrt{N_k}})$ and $e_3=\mathcal{O}\left(\sqrt{\mathcal{X}\mathcal{U}}\left[\frac{A}{N_{\mathrm{pop}}}\sum_{k\in[K]}\sqrt{N_k}+\frac{B}{\sqrt{N_{\mathrm{pop}}}}\right]\right)$, respectively, where $A, B$ are some constants and $\mathcal{X},\mathcal{U}$ are the sizes of state and action spaces of each agent. Finally, we design a Natural Policy Gradient (NPG) based algorithm that, in the three cases stated above, can converge to an optimal MARL policy within $\mathcal{O}(e_j)$ error with a sample complexity of $\mathcal{O}(e_j^{3})$, $j\in\{1,2,3\}$, respectively.

Mridul Agarwal · Vaneet Aggarwal · Washim Mondal · Satish Ukkusuri 🔗 
Tue 1:45 p.m.  2:00 p.m.

Spotlight Talk: Public Information Representation for Adversarial Team Games
(
Spotlight Talk
)
link
SlidesLive Video The study of sequential games in which a team plays against an adversary is receiving an increasing attention in the scientific literature.Their peculiarity resides in the asymmetric information available to the team members during the play which makes the equilibrium computation problem hard even with zerosum payoffs. The algorithms available in the literature work with implicit representations of the strategy space and mainly resort to \textit{Linear Programming} and \emph{column generation} techniques. Such representations prevent from the adoption of standard tools for the generation of abstractions that previously demonstrated to be crucial when solving huge twoplayer zerosum games. Differently from those works, we investigate the problem of designing a suitable game representation over which abstraction algorithms can work. In particular, our algorithms convert a sequential teamgame with adversaries to a classical \textit{twoplayer zerosum} game. In this converted game, the team is transformed into a single coordinator player which only knows information common to the whole team and prescribes to the players an action for any possible private state. Our conversion enables the adoption of highly scalable techniques already available for twoplayer zerosum games, including techniques for generating automated abstractions. Because of the \textsf{NP}hard nature of the problem, the resulting Public Team game may be exponentially larger than the original one. To limit this explosion, we design three pruning techniques that dramatically reduce the size of the tree. Finally, we show the effectiveness of the proposed approach by presenting experimental results on \textit{Kuhn} and \textit{Leduc Poker} games, obtained by applying stateofart algorithms for two players zerosum games on the converted games. 
Luca Carminati · Federico Cacciamani · Marco Ciccone · Nicola Gatti 🔗 
Tue 2:00 p.m.  2:15 p.m.

Closing Remarks
(
Closing Remarks
)
SlidesLive Video 
Gillian Hadfield 🔗 