Sponsored by the Center for Human-Compatible AI at UC Berkeley, and with support from the Simons Institute and the Center for Long-Term Cybersecurity, we are convening a cross-disciplinary group of researchers to examine the near-term policy concerns of Reinforcement Learning (RL). RL is a rapidly growing branch of AI research, with the capacity to learn to exploit our dynamic behavior in real time. From YouTube’s recommendation algorithm to post-surgery opioid prescriptions, RL algorithms are poised to permeate our daily lives. The ability of the RL system to tease out behavioral responses, and the human experimentation inherent to its learning, motivate a range of crucial policy questions about RL’s societal implications that are distinct from those addressed in the literature on other branches of Machine Learning (ML).
Tue 4:00 a.m. - 5:00 a.m.
|
Pre-show meet and greet ( Gather town session ) link » | 🔗 |
Tue 5:00 a.m. - 5:20 a.m.
|
Welcome
(
Brief introduction
)
SlidesLive Video » Brief opening remarks from the workshop organizers |
Aaron Snoswell · Thomas Gilbert · Michael Dennis · Tom O Zick 🔗 |
Tue 5:20 a.m. - 5:40 a.m.
|
Culturing PERLS
(
Plenary presentation
)
SlidesLive Video » |
Mark Nitzberg 🔗 |
Tue 5:40 a.m. - 5:55 a.m.
|
Audience Q+A for plenary presentation
(
Live Q+A
)
|
Mark Nitzberg · Aaron Snoswell 🔗 |
Tue 5:55 a.m. - 6:00 a.m.
|
5 minute break link » | 🔗 |
Tue 6:00 a.m. - 6:10 a.m.
|
V&S | Theme and speaker introductions
(
Brief introduction
)
|
Michael Dennis 🔗 |
Tue 6:10 a.m. - 6:35 a.m.
|
V&S | RL Fictions
(
Presentation
)
SlidesLive Video » |
Stuart J Russell 🔗 |
Tue 6:35 a.m. - 7:00 a.m.
|
V&S | Assumptions of Making Things Computable
(
Presentation
)
SlidesLive Video » |
Mireille Hildebrandt 🔗 |
Tue 7:00 a.m. - 7:40 a.m.
|
V&S | Panel discussion
(
Live panel discussion
)
SlidesLive Video » |
Michael Dennis · Stuart J Russell · Mireille Hildebrandt · Salome Viljoen · Natasha Jaques 🔗 |
Tue 7:40 a.m. - 7:50 a.m.
|
10 minute break link » | 🔗 |
Tue 7:50 a.m. - 8:00 a.m.
|
LAF | Theme and speaker introductions
(
Brief introduction
)
|
Aaron Snoswell 🔗 |
Tue 8:00 a.m. - 8:10 a.m.
|
LAF | "Legitimacy" in the Computational Elicitation of Preferences in Mechanism Design
(
Short presentation
)
SlidesLive Video » |
Jake Goldenfein 🔗 |
Tue 8:10 a.m. - 8:20 a.m.
|
LAF | The Role of Explanation in RL Legitimacy, Accountability, and Feedback
(
Short presentation
)
SlidesLive Video » |
Finale Doshi-Velez 🔗 |
Tue 8:20 a.m. - 8:30 a.m.
|
LAF | Evaluating Reinforcement Learners
(
Short presentation
)
SlidesLive Video » |
Michael Littman 🔗 |
Tue 8:30 a.m. - 9:15 a.m.
|
LAF | Panel discussion
(
Live panel discussion
)
SlidesLive Video » |
Aaron Snoswell · Jake Goldenfein · Finale Doshi-Velez · Evi Micha · Ivana Dusparic · Jonathan Stray 🔗 |
Tue 9:15 a.m. - 10:00 a.m.
|
45 minute lunch break link » | 🔗 |
Tue 10:00 a.m. - 11:55 a.m.
|
Poster session for accepted papers ( Gather town session ) link » | 🔗 |
Tue 11:55 a.m. - 12:00 p.m.
|
5 minute break link » | 🔗 |
Tue 12:00 p.m. - 12:10 p.m.
|
TD | Theme and speaker introductions
(
Brief introduction
)
|
Thomas Gilbert 🔗 |
Tue 12:10 p.m. - 12:20 p.m.
|
TD | Antimonopoly as a Tool for Democratization?
(
Short presentation
)
SlidesLive Video » |
Ayse Yasar 🔗 |
Tue 12:20 p.m. - 12:30 p.m.
|
TD | Reinforcement of What? Shaping the Digitization of Judgement by Reinforcement Learning
(
Short presentation
)
SlidesLive Video » |
Frank Pasquale 🔗 |
Tue 12:30 p.m. - 12:40 p.m.
|
TD | Metrics are Tricky
(
Short presentation
)
SlidesLive Video » |
Rachel Thomas 🔗 |
Tue 12:40 p.m. - 1:30 p.m.
|
TD | Panel Discussion
(
Live panel discussion
)
SlidesLive Video » |
Thomas Gilbert · Ayse Yasar · Rachel Thomas · Mason Kortz · Frank Pasquale · Jessica Forde 🔗 |
Tue 1:30 p.m. - 1:45 p.m.
|
Closing remarks
(
Brief conclusion
)
SlidesLive Video » |
Thomas Gilbert · Aaron Snoswell · Michael Dennis · Tom O Zick 🔗 |
-
|
Deciding What's Fair: Challenges of Applying Reinforcement Learning in Online Marketplaces
(
Poster
)
link »
Reinforcement learning (RL) techniques offer a versatile and powerful extension to the toolkit for computer scientists and marketplace designers for their use in online marketplaces. As the use of these techniques continues to expand, their application in online marketplaces raise questions of their appropriate use, particularly around issues of fairness and market transparency. I argue that the use of RL techniques, alongside similar calls in domains such as automated vehicle systems, is a problem of sociotechnical specification that faces a set of normative and regulatory challenges unique to marketplaces. I provide a selective overview of the RL literature as applied to markets to illustrate challenges associated with the use of RL techniques in online marketplaces. I conclude with a discussion of capacity-building in research and institutions that is required in order for benefits from algorithmically managed marketplaces to be realized for stakeholders and broader society. |
Andrew Chong 🔗 |
-
|
Robust Algorithmic Collusion
(
Poster
)
link »
This paper develops an approach to assess reinforcement learners with collusive pricing policies in a testing environment. We find that algorithms are unable to extrapolate collusive policies from their training environment to testing environments. Collusion consistently breaks down, and algorithms instead tend to converge to Nash prices. Policy updating with or without exploration re-establishes collusion, but only in the current environment. This is robust to repeated learning across environments. Our results indicate that frequent market interaction, coordination of algorithm design, and stable environments are essential for algorithmic collusion. |
Nicolas Eschenbaum · Philipp Zahn 🔗 |
-
|
Power and Accountability in RL-driven Environmental Policy
(
Poster
)
link »
Machine learning (ML) methods already permeate environmental decision-making, from processing high-dimensional data on earth systems to monitoring compliance with environmental regulations. Of the ML techniques available to address pressing environmental problems (e.g., climate change, biodiversity loss), Reinforcement Learning (RL) may both hold the greatest promise and present the most pressing perils. This paper explores how RL-driven policy refracts existing power relations in the environmental domain while also creating unique challenges to ensuring equitable and accountable environmental decision processes. We focus on how RL technologies shift the distribution of decision-making, agenda-setting, and ideological power between resource users, governing bodies, and private industry. |
Melissa Chapman · Caleb Scoville · Carl Boettiger 🔗 |
-
|
Calculus of Consent via MARL: Legitimating the Collaborative Governance Supplying Public Goods
(
Poster
)
link »
Public policies that supply public goods, especially those involve collaboration by limiting individual liberty, always give rise to controversies over governance legitimacy. Multi-Agent Reinforcement Learning (MARL) methods are appropriate for supporting the legitimacy of the public policies that supply public goods at the cost of individual interests. Among these policies, the inter-regional collaborative pandemic control is a prominent example, which has become much more important for an increasingly inter-connected world facing a global pandemic like COVID-19. Different patterns of collaborative strategies have been observed among different systems of regions, yet it lacks an analytical process to reason for the legitimacy of those strategies. In this paper, we use the inter-regional collaboration for pandemic control as an example to demonstrate the necessity of MARL in reasoning, and thereby legitimizing policies enforcing such inter-regional collaboration. Experimental results in an exemplary environment show that our MARL approach is able to demonstrate the effectiveness and necessity of restrictions on individual liberty for collaborative supply of public goods. Different optimal policies are learned by our MARL agents under different collaboration levels, which change in an interpretable pattern of collaboration that helps to balance the losses suffered by regions of different types, and consequently promotes the overall welfare. Meanwhile, policies learned with higher collaboration levels yield higher global rewards, which illustrates the benefit of, and thus provides a novel justification for the legitimacy of, promoting inter-regional collaboration. Therefore, our method shows the capability of MARL in computationally modeling and supporting the theory of calculus of consent, developed by Nobel Prize winner J. M. Buchanan. |
Yang Hu · Zhui Zhu · Sirui Song · Xue (Steve) Liu · Yang Yu 🔗 |
-
|
Demanding and Designing Aligned Cognitive Architectures
(
Poster
)
link »
With AI systems becoming more powerful and pervasive, there is increasing debate about keeping their actions aligned with the broader goals and needs of humanity. This multi-disciplinary and multi-stakeholder debate must resolve many issues, here we examine two of them. The first is to clarify what demands stakeholders might usefully make on the designers of AI systems, useful because the technology exists to implement them. We introduce the framing of cognitive architectures to make this technical topic more accessible. The second issue is how stakeholders should calibrate their interactions with modern machine learning researchers. We consider how current fashions in machine learning create a narrative pull that participants in technical and policy discussions should be aware of, so that they can compensate for it. We identify several technically tractable but currently unfashionable options for improving AI alignment. |
Koen Holtman 🔗 |