Timezone: »
Sponsored by the Center for Human-Compatible AI at UC Berkeley, and with support from the Simons Institute and the Center for Long-Term Cybersecurity, we are convening a cross-disciplinary group of researchers to examine the near-term policy concerns of Reinforcement Learning (RL). RL is a rapidly growing branch of AI research, with the capacity to learn to exploit our dynamic behavior in real time. From YouTube’s recommendation algorithm to post-surgery opioid prescriptions, RL algorithms are poised to permeate our daily lives. The ability of the RL system to tease out behavioral responses, and the human experimentation inherent to its learning, motivate a range of crucial policy questions about RL’s societal implications that are distinct from those addressed in the literature on other branches of Machine Learning (ML).
Tue 4:00 a.m. - 5:00 a.m.
|
Pre-show meet and greet ( Gather town session ) link » | 🔗 |
Tue 5:00 a.m. - 5:20 a.m.
|
Welcome
(
Brief introduction
)
SlidesLive Video » Brief opening remarks from the workshop organizers |
Aaron Snoswell · Thomas Gilbert · Michael Dennis · Tom O Zick 🔗 |
Tue 5:20 a.m. - 5:40 a.m.
|
Culturing PERLS
(
Plenary presentation
)
SlidesLive Video » |
Mark Nitzberg 🔗 |
Tue 5:40 a.m. - 5:55 a.m.
|
Audience Q+A for plenary presentation
(
Live Q+A
)
|
Mark Nitzberg · Aaron Snoswell 🔗 |
Tue 5:55 a.m. - 6:00 a.m.
|
5 minute break link » | 🔗 |
Tue 6:00 a.m. - 6:10 a.m.
|
V&S | Theme and speaker introductions
(
Brief introduction
)
|
Michael Dennis 🔗 |
Tue 6:10 a.m. - 6:35 a.m.
|
V&S | RL Fictions
(
Presentation
)
SlidesLive Video » |
Stuart J Russell 🔗 |
Tue 6:35 a.m. - 7:00 a.m.
|
V&S | Assumptions of Making Things Computable
(
Presentation
)
SlidesLive Video » |
Mireille Hildebrandt 🔗 |
Tue 7:00 a.m. - 7:40 a.m.
|
V&S | Panel discussion
(
Live panel discussion
)
SlidesLive Video » |
Michael Dennis · Stuart J Russell · Mireille Hildebrandt · Salome Viljoen · Natasha Jaques 🔗 |
Tue 7:40 a.m. - 7:50 a.m.
|
10 minute break link » | 🔗 |
Tue 7:50 a.m. - 8:00 a.m.
|
LAF | Theme and speaker introductions
(
Brief introduction
)
|
Aaron Snoswell 🔗 |
Tue 8:00 a.m. - 8:10 a.m.
|
LAF | "Legitimacy" in the Computational Elicitation of Preferences in Mechanism Design
(
Short presentation
)
SlidesLive Video » |
Jake Goldenfein 🔗 |
Tue 8:10 a.m. - 8:20 a.m.
|
LAF | The Role of Explanation in RL Legitimacy, Accountability, and Feedback
(
Short presentation
)
SlidesLive Video » |
Finale Doshi-Velez 🔗 |
Tue 8:20 a.m. - 8:30 a.m.
|
LAF | Evaluating Reinforcement Learners
(
Short presentation
)
SlidesLive Video » |
Michael Littman 🔗 |
Tue 8:30 a.m. - 9:15 a.m.
|
LAF | Panel discussion
(
Live panel discussion
)
SlidesLive Video » |
Aaron Snoswell · Jake Goldenfein · Finale Doshi-Velez · Evi Micha · Ivana Dusparic · Jonathan Stray 🔗 |
Tue 9:15 a.m. - 10:00 a.m.
|
45 minute lunch break link » | 🔗 |
Tue 10:00 a.m. - 11:55 a.m.
|
Poster session for accepted papers ( Gather town session ) link » | 🔗 |
Tue 11:55 a.m. - 12:00 p.m.
|
5 minute break link » | 🔗 |
Tue 12:00 p.m. - 12:10 p.m.
|
TD | Theme and speaker introductions
(
Brief introduction
)
|
Thomas Gilbert 🔗 |
Tue 12:10 p.m. - 12:20 p.m.
|
TD | Antimonopoly as a Tool for Democratization?
(
Short presentation
)
SlidesLive Video » |
Ayse Yasar 🔗 |
Tue 12:20 p.m. - 12:30 p.m.
|
TD | Reinforcement of What? Shaping the Digitization of Judgement by Reinforcement Learning
(
Short presentation
)
SlidesLive Video » |
Frank Pasquale 🔗 |
Tue 12:30 p.m. - 12:40 p.m.
|
TD | Metrics are Tricky
(
Short presentation
)
SlidesLive Video » |
Rachel Thomas 🔗 |
Tue 12:40 p.m. - 1:30 p.m.
|
TD | Panel Discussion
(
Live panel discussion
)
SlidesLive Video » |
Thomas Gilbert · Ayse Yasar · Rachel Thomas · Mason Kortz · Frank Pasquale · Jessica Forde 🔗 |
Tue 1:30 p.m. - 1:45 p.m.
|
Closing remarks
(
Brief conclusion
)
SlidesLive Video » |
Thomas Gilbert · Aaron Snoswell · Michael Dennis · Tom O Zick 🔗 |
-
|
Deciding What's Fair: Challenges of Applying Reinforcement Learning in Online Marketplaces
(
Poster
)
link »
Reinforcement learning (RL) techniques offer a versatile and powerful extension to the toolkit for computer scientists and marketplace designers for their use in online marketplaces. As the use of these techniques continues to expand, their application in online marketplaces raise questions of their appropriate use, particularly around issues of fairness and market transparency. I argue that the use of RL techniques, alongside similar calls in domains such as automated vehicle systems, is a problem of sociotechnical specification that faces a set of normative and regulatory challenges unique to marketplaces. I provide a selective overview of the RL literature as applied to markets to illustrate challenges associated with the use of RL techniques in online marketplaces. I conclude with a discussion of capacity-building in research and institutions that is required in order for benefits from algorithmically managed marketplaces to be realized for stakeholders and broader society. |
Andrew Chong 🔗 |
-
|
Robust Algorithmic Collusion
(
Poster
)
link »
This paper develops an approach to assess reinforcement learners with collusive pricing policies in a testing environment. We find that algorithms are unable to extrapolate collusive policies from their training environment to testing environments. Collusion consistently breaks down, and algorithms instead tend to converge to Nash prices. Policy updating with or without exploration re-establishes collusion, but only in the current environment. This is robust to repeated learning across environments. Our results indicate that frequent market interaction, coordination of algorithm design, and stable environments are essential for algorithmic collusion. |
Nicolas Eschenbaum · Philipp Zahn 🔗 |
-
|
Power and Accountability in RL-driven Environmental Policy
(
Poster
)
link »
Machine learning (ML) methods already permeate environmental decision-making, from processing high-dimensional data on earth systems to monitoring compliance with environmental regulations. Of the ML techniques available to address pressing environmental problems (e.g., climate change, biodiversity loss), Reinforcement Learning (RL) may both hold the greatest promise and present the most pressing perils. This paper explores how RL-driven policy refracts existing power relations in the environmental domain while also creating unique challenges to ensuring equitable and accountable environmental decision processes. We focus on how RL technologies shift the distribution of decision-making, agenda-setting, and ideological power between resource users, governing bodies, and private industry. |
Melissa Chapman · Caleb Scoville · Carl Boettiger 🔗 |
-
|
Calculus of Consent via MARL: Legitimating the Collaborative Governance Supplying Public Goods
(
Poster
)
link »
Public policies that supply public goods, especially those involve collaboration by limiting individual liberty, always give rise to controversies over governance legitimacy. Multi-Agent Reinforcement Learning (MARL) methods are appropriate for supporting the legitimacy of the public policies that supply public goods at the cost of individual interests. Among these policies, the inter-regional collaborative pandemic control is a prominent example, which has become much more important for an increasingly inter-connected world facing a global pandemic like COVID-19. Different patterns of collaborative strategies have been observed among different systems of regions, yet it lacks an analytical process to reason for the legitimacy of those strategies. In this paper, we use the inter-regional collaboration for pandemic control as an example to demonstrate the necessity of MARL in reasoning, and thereby legitimizing policies enforcing such inter-regional collaboration. Experimental results in an exemplary environment show that our MARL approach is able to demonstrate the effectiveness and necessity of restrictions on individual liberty for collaborative supply of public goods. Different optimal policies are learned by our MARL agents under different collaboration levels, which change in an interpretable pattern of collaboration that helps to balance the losses suffered by regions of different types, and consequently promotes the overall welfare. Meanwhile, policies learned with higher collaboration levels yield higher global rewards, which illustrates the benefit of, and thus provides a novel justification for the legitimacy of, promoting inter-regional collaboration. Therefore, our method shows the capability of MARL in computationally modeling and supporting the theory of calculus of consent, developed by Nobel Prize winner J. M. Buchanan. |
Yang Hu · Zhui Zhu · Sirui Song · Xue (Steve) Liu · Yang Yu 🔗 |
-
|
Demanding and Designing Aligned Cognitive Architectures
(
Poster
)
link »
With AI systems becoming more powerful and pervasive, there is increasing debate about keeping their actions aligned with the broader goals and needs of humanity. This multi-disciplinary and multi-stakeholder debate must resolve many issues, here we examine two of them. The first is to clarify what demands stakeholders might usefully make on the designers of AI systems, useful because the technology exists to implement them. We introduce the framing of cognitive architectures to make this technical topic more accessible. The second issue is how stakeholders should calibrate their interactions with modern machine learning researchers. We consider how current fashions in machine learning create a narrative pull that participants in technical and policy discussions should be aware of, so that they can compensate for it. We identify several technically tractable but currently unfashionable options for improving AI alignment. |
Koen Holtman 🔗 |
Author Information
Thomas Gilbert (UC Berkeley)
Stuart J Russell (UC Berkeley)
Tom O Zick (Harvard)
Tom Zick earned her PhD from UC Berkeley and is a current fellow at the Berkman Klein Center for Internet and Society at Harvard. Her research bridges between AI ethics and law, with a focus on how to craft safe and equitable policy surrounding the adoption of AI in high-stakes domains. In the past, she has worked as a data scientist at the Berkeley Center for Law and Technology, evaluating the capacity of regulations to promote open government data. She has also collaborated with graduate students across social science and engineering to advocate for pedagogy reform focused on infusing social context into technical coursework. Outside of academia, Tom has crafted digital policy for the City of Boston as a fellow for the Mayor’s Office for New Urban Mechanics. Her current research centers on the near term policy concerns surrounding reinforcement learning.
Aaron Snoswell (Queensland University of Technology)
Aaron is a research fellow in computational law at the Australian Research Council Centre of Excellence for Autonomous Decision Making and Society. With a background in cross-disciplinary mechatronic engineering, Aaron’s Ph.D. research developed new theory and algorithms for Inverse Reinforcement Learning in the maximum conditional entropy and multiple intent settings. Aaron’s ongoing work investigates technical measures for achieving value alignment for autonomous decision making systems, and legal-theoretic models for AI accountability.
Michael Dennis (University of California Berkeley)
Michael Dennis is a 5th year grad student at the Center for Human-Compatible AI. With a background in theoretical computer science, he is working to close the gap between decision theoretic and game theoretic recommendations and the current state of the art approaches to robust RL and multi-agent RL. The overall aim of this work is to ensure that our systems behave in a way that is robustly beneficial. In the single agent setting, this means making decisions and managing risk in the way the designer intends. In the multi-agent setting, this means ensuring that the concerns of the designer and those of others in the society are fairly and justly negotiated to the benefit of all involved.
More from the Same Authors
-
2021 : Grounding Aleatoric Uncertainty in Unsupervised Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Andrei Lupu · Heinrich Kuttler · Edward Grefenstette · Tim Rocktäschel · Jakob Foerster -
2021 : That Escalated Quickly: Compounding Complexity by Editing Levels at the Frontier of Agent Capabilities »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2022 : Adversarial Policies Beat Professional-Level Go AIs »
Tony Wang · Adam Gleave · Nora Belrose · Tom Tseng · Michael Dennis · Yawen Duan · Viktor Pogrebniak · Joseph Miller · Sergey Levine · Stuart J Russell -
2022 : Adversarial Policies Beat Professional-Level Go AIs »
Tony Wang · Adam Gleave · Nora Belrose · Tom Tseng · Michael Dennis · Yawen Duan · Viktor Pogrebniak · Joseph Miller · Sergey Levine · Stuart Russell -
2023 Poster: Bridging RL Theory and Practice with the Effective Horizon »
Cassidy Laidlaw · Stuart J Russell · Anca Dragan -
2023 Oral: Bridging RL Theory and Practice with the Effective Horizon »
Cassidy Laidlaw · Stuart J Russell · Anca Dragan -
2021 : Closing remarks »
Thomas Gilbert · Aaron Snoswell · Michael Dennis · Tom O Zick -
2021 : TD | Panel Discussion »
Thomas Gilbert · Ayse Yasar · Rachel Thomas · Mason Kortz · Frank Pasquale · Jessica Forde -
2021 : TD | Theme and speaker introductions »
Thomas Gilbert -
2021 : LAF | Panel discussion »
Aaron Snoswell · Jake Goldenfein · Finale Doshi-Velez · Evi Micha · Ivana Dusparic · Jonathan Stray -
2021 : LAF | Theme and speaker introductions »
Aaron Snoswell -
2021 : V&S | Panel discussion »
Michael Dennis · Stuart J Russell · Mireille Hildebrandt · Salome Viljoen · Natasha Jaques -
2021 : V&S | RL Fictions »
Stuart J Russell -
2021 : V&S | Theme and speaker introductions »
Michael Dennis -
2021 : Audience Q+A for plenary presentation »
Mark Nitzberg · Aaron Snoswell -
2021 : Welcome »
Aaron Snoswell · Thomas Gilbert · Michael Dennis · Tom O Zick -
2021 Poster: Replay-Guided Adversarial Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2020 Poster: Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design »
Michael Dennis · Natasha Jaques · Eugene Vinitsky · Alexandre Bayen · Stuart Russell · Andrew Critch · Sergey Levine -
2020 Oral: Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design »
Michael Dennis · Natasha Jaques · Eugene Vinitsky · Alexandre Bayen · Stuart Russell · Andrew Critch · Sergey Levine -
2019 : Hard Choices in AI Safety »
Roel Dobbe · Thomas Gilbert · Yonatan Mintz -
2018 Poster: Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making »
Nishant Desai · Andrew Critch · Stuart J Russell -
2017 Poster: Inverse Reward Design »
Dylan Hadfield-Menell · Smitha Milli · Pieter Abbeel · Stuart J Russell · Anca Dragan -
2017 Oral: Inverse Reward Design »
Dylan Hadfield-Menell · Smitha Milli · Pieter Abbeel · Stuart J Russell · Anca Dragan -
2016 Poster: Cooperative Inverse Reinforcement Learning »
Dylan Hadfield-Menell · Stuart J Russell · Pieter Abbeel · Anca Dragan -
2015 Poster: Gaussian Process Random Fields »
Dave Moore · Stuart J Russell -
2014 Workshop: 3rd NIPS Workshop on Probabilistic Programming »
Daniel Roy · Josh Tenenbaum · Thomas Dietterich · Stuart J Russell · YI WU · Ulrik R Beierholm · Alp Kucukelbir · Zenna Tavares · Yura Perov · Daniel Lee · Brian Ruttenberg · Sameer Singh · Michael Hughes · Marco Gaboardi · Alexey Radul · Vikash Mansinghka · Frank Wood · Sebastian Riedel · Prakash Panangaden -
2014 Poster: Algorithm selection by rational metareasoning as a model of human strategy selection »
Falk Lieder · Dillon Plunkett · Jessica B Hamrick · Stuart J Russell · Nicholas Hay · Tom Griffiths -
2013 Poster: Multilinear Dynamical Systems for Tensor Time Series »
Mark Rogers · Lei Li · Stuart J Russell -
2010 Poster: Global seismic monitoring as probabilistic inference »
Nimar Arora · Stuart J Russell · Paul Kidwell · Erik Sudderth -
2008 Poster: Probabilistic detection of short events, with application to critical care monitoring »
Norm Aleks · Stuart J Russell · Michael G Madden · Diane Morabito · Geoffrey T Manley · Kristan Staudenmayer · Mitchell Cohen