Timezone: »

Workshop
Cooperative AI
Natasha Jaques · Edward Hughes · Jakob Foerster · Noam Brown · Kalesha Bullard · Charlotte Smith

Tue Dec 14 05:20 AM -- 02:15 PM (PST) @

The human ability to cooperate in a wide range of contexts is a key ingredient in the success of our species. Problems of cooperation—in which agents seek ways to jointly improve their welfare—are ubiquitous and important. They can be found at every scale, from the daily routines of highway driving, communicating in shared language and work collaborations, to the global challenges of climate change, pandemic preparedness and international trade. With AI agents playing an ever greater role in our lives, we must endow them with similar abilities. In particular they must understand the behaviors of others, find common ground by which to communicate with them, make credible commitments, and establish institutions which promote cooperative behavior. By construction, the goal of Cooperative AI is interdisciplinary in nature. Therefore, our workshop will bring together scholars from diverse backgrounds including reinforcement learning (and inverse RL), multi-agent systems, human-AI interaction, game theory, mechanism design, social choice, fairness, cognitive science, language learning, and interpretability. This year we will organize the workshop along two axes. First, we will discuss how to incentivize cooperation in AI systems, developing algorithms that can act effectively in general-sum settings, and which encourage others to cooperate. The second focus is on how to implement effective coordination, given that cooperation is already incentivized. For example, we may examine zero-shot coordination, in which AI agents need to coordinate with novel partners at test time. This setting is highly relevant to human-AI coordination, and provides a stepping stone for the community towards full Cooperative AI.

 Tue 5:20 a.m. - 5:30 a.m. Welcome and Opening Remarks Edward Hughes · Natasha Jaques 🔗 Tue 5:30 a.m. - 6:00 a.m. Invited Talk: Bo An (Nanyang Technological University) on Learning to Coordinate in Complex Environments (Invited Talk) Bo An 🔗 Tue 6:00 a.m. - 6:30 a.m. Invited Talk: Michael Muthukrishna (London School of Economics) on Cultural Evolution and Human Cooperation (Invited Talk)    In the modern world, we cooperate with and live side by side with strangers, who often look, act, and speak in ways very different to us. We work together on goals with culturally distant nations that span the globe. I'm recording this talk, but I could have given it to you in person. That's unusual in many respects. It's unusual from a cross-species perspective - comparing us to our closest primate cousins, a room full of strange chimps is a room full of dead chimps. It's unusual from a historical perspective - even a few hundred years ago, a stranger in our midst was a potential threat. And it's unusual from a geographic perspective - even today some places are safer and more cooperative than others. Cooperation varies in scale, intensity, and domain - some countries cooperate on healthcare, others on defence. Compounding the puzzle, the evolutionary mechanisms that explain cooperation undermine one another and can stabilize non-cooperative or even maladaptive behavior. I'll discuss the latest discoveries in the science of cultural evolution and human cooperation and how these might apply to the development of cooperative AI. Michael Muthukrishna 🔗 Tue 6:30 a.m. - 7:00 a.m. Invited Talk: Pablo Castro (Google Brain) on Estimating Policy Functions in Payment Systems using Reinforcement Learning (Invited Talk)    In this talk I will present some of our findings (in collaboration with the Bank of Canada) on using RL to approximate the policy rules of banks participating in a high-value payments system. The objective of the agents is to learn a policy function for the choice of amount of liquidity provided to the system at the beginning of the day. Individual choices have complex strategic effects precluding a closed form solution of the optimal policy, except in simple cases. We show that in a simplified two-agent setting, agents using reinforcement learning do learn the optimal policy that minimizes the cost of processing their individual payments. We also show that in more complex settings, both agents learn to reduce their liquidity costs. Our results show the applicability of RL to estimate best-response functions in real-world strategic games. Pablo Samuel Castro 🔗 Tue 7:00 a.m. - 7:15 a.m. (Live) Q&A with Invited Speaker (Bo An) (Live Q&A) 🔗 Tue 7:15 a.m. - 7:30 a.m. (Live) Q&A with Invited Speaker (Michael Muthukrishna) (Live Q&A) 🔗 Tue 7:30 a.m. - 7:45 a.m. (Live) Q&A with Invited Speaker (Pablo Castro) (Live Q&A) 🔗 Tue 7:45 a.m. - 8:15 a.m. Invited Talk: Ariel Procaccia (Harvard University) on Democracy and the Pursuit of Randomness (Invited Talk)    Sortition is a storied paradigm of democracy built on the idea of choosing representatives through lotteries instead of elections. In recent years this idea has found renewed popularity in the form of citizens’ assemblies, which bring together randomly selected people from all walks of life to discuss key questions and deliver policy recommendations. A principled approach to sortition, however, must resolve the tension between two competing requirements: that the demographic composition of citizens’ assemblies reflect the general population and that every person be given a fair chance (literally) to participate. I will describe our work on designing, analyzing and implementing randomized participant selection algorithms that balance these two requirements. I will also discuss practical challenges in sortition based on experience with the adoption and deployment of our open-source system, Panelot. Ariel Procaccia 🔗 Tue 8:15 a.m. - 8:45 a.m. Invited Talk: Dorsa Sadigh (Stanford University) on The Role of Conventions in Adaptive Human-AI Interaction (Invited Talk)    Today I will be talking about the role of conventions in human-AI collaboration. Conventions are norms/equilibria we build through repeated interactions with each other. The idea of conventions has been well-studied in linguistics. We will start the talk by discussing the notion of linguistic conventions, and how we can build AI agents that can effectively build these conventions. We then extend the idea of linguistic conventions to conventions through actions. We discuss a modular approach to separate partner-specific conventions and rule-dependent representations. We then discuss how this can be done effectively when working with partners whose actions are high dimensional. Finally we extend the notion of conventions to larger scale systems beyond dyadic interactions. Specifically, we discuss what conventions/equilibria emerge in mixed-autonomy traffic networks and how that can be leveraged for better dynamic routing of vehicles. Dorsa Sadigh 🔗 Tue 8:45 a.m. - 9:15 a.m. (Live) Invited Talk: Nika Haghtalab (UC Berkeley) on Collaborative Machine Learning: Training and Incentives ((Live) Invited Talk)    Many modern machine learning paradigms require large amounts of data and computation power that is rarely seen in one place or owned by one agent. In recent years, methods such as federated learning have been embraced as an approach for bringing about collaboration across learning agents. In practice, the success of these methods relies upon our ability to pool together the efforts of large numbers of individual learning agents, data set owners, and curators. In this talk, I will discuss how recruiting, serving, and retaining these agents requires us to address agents’ needs, limitations, and responsibilities. In particular, I will discuss two major questions in this field. First, how can we design collaborative learning mechanisms that benefit agents with heterogeneous learning objectives? Second, how can we ensure that the burden of data collection and learning is shared equitably between agents? Nika Haghtalab 🔗 Tue 9:15 a.m. - 9:30 a.m. (Live) Q&A with Invited Speaker (Ariel Procaccia) (Live Q&A) 🔗 Tue 9:30 a.m. - 9:45 a.m. (Live) Q&A with Invited Speaker (Dorsa Sadigh) (Live Q&A) 🔗 Tue 9:45 a.m. - 10:00 a.m. (Live) Q&A with Invited Speaker (Nika Haghtalab) (Live Q&A) 🔗 Tue 10:00 a.m. - 11:00 a.m. Workshop Poster Session 1 (hosted in GatherTown) (Poster Sessions (GatherTown))  link » 🔗 Tue 11:00 a.m. - 12:00 p.m. Workshop Poster Session 2 (hosted in GatherTown) (Poster Sessions (GatherTown))  link » 🔗 Tue 12:00 p.m. - 1:00 p.m. (Live) Panel Discussion: Cooperative AI (Panel Discussion) Kalesha Bullard · Allan Dafoe · Fei Fang · Chris Amato · Elizabeth M. Adams 🔗 Tue 1:00 p.m. - 1:15 p.m. Spotlight Talk: Interactive Inverse Reinforcement Learning for Cooperative Games (Spotlight Talk)  link »    We study the problem of designing AI agents that cooperate effectively with a potentially suboptimal partner while having no access to the joint reward function. This problem is modeled as a cooperative episodic two-agent Markov Decision Process. We assume control over only the first of the two agents in a Stackelberg formulation of the game, where the second agent is acting so as to maximise expected utility given the first agent's policy. How should the first agent act so it can learn the joint reward function as quickly as possible, and so that the joint policy is as close to optimal as possible? In this paper, we analyse how knowledge about the reward function can be gained. We show that when the learning agent's policies have a significant effect on the transition function, the reward function can be learned efficiently. Link » Thomas Kleine Büning · Anne-Marie George · Christos Dimitrakakis 🔗 Tue 1:15 p.m. - 1:30 p.m. Spotlight Talk: Learning to solve complex tasks by growing knowledge culturally across generations (Spotlight Talk)  link »    Knowledge built culturally across generations allows humans to learn far more than an individual could glean from their own experience in a lifetime.  Cultural knowledge in turn rests on language: language is the richest record of what previous generations believed, valued, and practiced, and how these evolved over time. The power and mechanisms of language as a means of cultural learning, however, are not well understood, and as a result, current AI systems do not leverage language as a means for cultural knowledge transmission.  Here, we take a first step towards reverse-engineering cultural learning through language. We developed a suite of complex tasks in the form of minimalist-style video games, which we deployed in an iterated learning paradigm. Human participants were limited to only two attempts (two lives) to beat each game and were allowed to write a message to a future participant who read the message before playing.  Knowledge accumulated gradually across generations, allowing later generations to advance further in the games and perform more efficient actions. Multigenerational learning followed a strikingly similar trajectory to individuals learning alone with an unlimited number of lives. These results suggest that language provides a sufficient medium to express and accumulate the knowledge people acquire in these diverse tasks: the dynamics of the environment, valuable goals, dangerous risks, and strategies for success. The video game paradigm we pioneer here is thus a rich test bed for developing AI systems capable of acquiring and transmitting cultural knowledge. Link » Noah Goodman · Josh Tenenbaum · MH Tessler · Jason Madeano 🔗 Tue 1:30 p.m. - 1:45 p.m. Spotlight Talk: On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC) (Spotlight Talk)  link »    Mean field control (MFC) is an effective way to mitigate the curse of dimensionality of cooperative multi-agent reinforcement learning (MARL) problems. This work considers a collection of $N_{\mathrm{pop}}$ heterogeneous agents that can be segregated into $K$ classes such that the $k$-th class contains $N_k$ homogeneous agents. We aim to prove approximation guarantees of the MARL problem for this heterogeneous system by its corresponding MFC problem. We consider three scenarios where the reward and transition dynamics of all agents are respectively taken to be functions of $(1)$ joint state and action distributions across all classes, $(2)$ individual distributions of each class, and $(3)$ marginal distributions of the entire population. We show that, in these cases, the $K$-class MARL problem can be approximated by MFC with errors given as $e_1=\mathcal{O}(\frac{\sqrt{|\mathcal{X}||\mathcal{U}|}}{N_{\mathrm{pop}}}\sum_{k}\sqrt{N_k})$, $e_2=\mathcal{O}(\sqrt{|\mathcal{X}||\mathcal{U}|}\sum_{k}\frac{1}{\sqrt{N_k}})$ and $e_3=\mathcal{O}\left(\sqrt{|\mathcal{X}||\mathcal{U}|}\left[\frac{A}{N_{\mathrm{pop}}}\sum_{k\in[K]}\sqrt{N_k}+\frac{B}{\sqrt{N_{\mathrm{pop}}}}\right]\right)$, respectively, where $A, B$ are some constants and $|\mathcal{X}|,|\mathcal{U}|$ are the sizes of state and action spaces of each agent. Finally, we design a Natural Policy Gradient (NPG) based algorithm that, in the three cases stated above, can converge to an optimal MARL policy within $\mathcal{O}(e_j)$ error with a sample complexity of $\mathcal{O}(e_j^{-3})$, $j\in\{1,2,3\}$, respectively. Link » Mridul Agarwal · Vaneet Aggarwal · Washim Mondal · Satish Ukkusuri 🔗 Tue 1:45 p.m. - 2:00 p.m. Spotlight Talk: Public Information Representation for Adversarial Team Games (Spotlight Talk)  link »    The study of sequential games in which a team plays against an adversary is receiving an increasing attention in the scientific literature.Their peculiarity resides in the asymmetric information available to the team members during the play which makes the equilibrium computation problem hard even with zero-sum payoffs. The algorithms available in the literature work with implicit representations of the strategy space and mainly resort to \textit{Linear Programming} and \emph{column generation} techniques. Such representations prevent from the adoption of standard tools for the generation of abstractions that previously demonstrated to be crucial when solving huge two-player zero-sum games. Differently from those works, we investigate the problem of designing a suitable game representation over which abstraction algorithms can work. In particular, our algorithms convert a sequential team-game with adversaries to a classical \textit{two-player zero-sum} game. In this converted game, the team is transformed into a single coordinator player which only knows information common to the whole team and prescribes to the players an action for any possible private state. Our conversion enables the adoption of highly scalable techniques already available for two-player zero-sum games, including techniques for generating automated abstractions. Because of the \textsf{NP}-hard nature of the problem, the resulting Public Team game may be exponentially larger than the original one. To limit this explosion, we design three pruning techniques that dramatically reduce the size of the tree. Finally, we show the effectiveness of the proposed approach by presenting experimental results on \textit{Kuhn} and \textit{Leduc Poker} games, obtained by applying state-of-art algorithms for two players zero-sum games on the converted games. Link » Luca Carminati · Federico Cacciamani · Marco Ciccone · Nicola Gatti 🔗 Tue 2:00 p.m. - 2:15 p.m. Closing Remarks Gillian Hadfield 🔗

#### Author Information

##### Jakob Foerster (University of Oxford)

Jakob Foerster received a CIFAR AI chair in 2019 and is starting as an Assistant Professor at the University of Toronto and the Vector Institute in the academic year 20/21. During his PhD at the University of Oxford, he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind. He has since been working as a research scientist at Facebook AI Research in California, where he will continue advancing the field up to his move to Toronto. He was the lead organizer of the first Emergent Communication (EmeCom) workshop at NeurIPS in 2017, which he has helped organize ever since.