Timezone: »
In order to be helpful to users and to society at large, an autonomous agent needs to be aligned with the objectives of its stakeholders. Misaligned incentives are a common and crucial problem with human agents --- we should expect similar challenges to arise from misaligned incentives with artificial agents. For example, it is not uncommon to see reinforcement learning agents ‘hack’ their specified reward function. How do we build learning systems that will reliably achieve a user's intended objective? How can we ensure that autonomous agents behave reliably in unforeseen situations? How do we design systems whose behavior will be aligned with the values and goals of society at large? As AI capabilities develop, it is crucial for the AI community to come to satisfying and trustworthy answers to these questions. This workshop will focus on three central challenges in value alignment: learning complex rewards that reflect human preferences (e.g. meaningful oversight, preference elicitation, inverse reinforcement learning, learning from demonstration or feedback), engineering reliable AI systems (e.g. robustness to distributional shift, model misspecification, or adversarial data, via methods such as adversarial training, KWIK-style learning, or transparency to human inspection), and dealing with bounded rationality and incomplete information in both AI systems and their users (e.g. acting on incomplete task specifications, learning from users who sometimes make mistakes). We also welcome submissions that do not directly fit these categories but generally deal with problems relating to value alignment in artificial intelligence.
Sat 9:15 a.m. - 9:30 a.m.
|
Opening Remarks
(
Talk
)
|
Dylan Hadfield-Menell 🔗 |
Sat 9:30 a.m. - 9:35 a.m.
|
Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning
(
Talk
)
|
Hadrien Hendrikx 🔗 |
Sat 9:45 a.m. - 10:15 a.m.
|
Minimax-Regret Querying on Side Effects in Factored Markov Decision Processes
(
Talk
)
|
Satinder Singh 🔗 |
Sat 10:15 a.m. - 10:30 a.m.
|
Robust Covariate Shift with Exact Loss Functions
(
Contributed Talk
)
|
Anqi Liu 🔗 |
Sat 11:00 a.m. - 11:30 a.m.
|
Adversarial Robustness for Aligned AI
(
Talk
)
|
Ian Goodfellow 🔗 |
Sat 11:30 a.m. - 12:00 p.m.
|
Incomplete Contracting and AI Alignment
(
Talk
)
|
Gillian Hadfield 🔗 |
Sat 1:15 p.m. - 1:45 p.m.
|
Learning from Human Feedback
(
Talk
)
|
Paul Christiano 🔗 |
Sat 1:45 p.m. - 2:00 p.m.
|
Finite Supervision Reinforcement Learning
(
Contributed Talk
)
|
William Saunders · Eric Langlois 🔗 |
Sat 2:00 p.m. - 2:15 p.m.
|
Safer Classification by Synthesis
(
Contributed Talk
)
|
William Wang 🔗 |
Sat 2:15 p.m. - 3:00 p.m.
|
Aligned AI Poster Session
(
Poster Session
)
|
Amanda Askell · Rafal Muszynski · William Wang · Yaodong Yang · Quoc Nguyen · Bryan Kian Hsiang Low · Patrick Jaillet · Candice Schumann · Anqi Liu · Peter Eckersley · Angelina Wang · William Saunders
|
Sat 3:30 p.m. - 4:00 p.m.
|
Machine Learning for Human Deliberative Judgment
(
Talk
)
|
Owain Evans 🔗 |
Sat 4:00 p.m. - 4:30 p.m.
|
Learning Reward Functions
(
Talk
)
|
Jan Leike 🔗 |
Sat 4:30 p.m. - 5:00 p.m.
|
Informal Technical Discussion: Open Problems in AI Alignment
(
Discussion
)
|
🔗 |
Author Information
Dylan Hadfield-Menell (UC Berkeley)
Jacob Steinhardt (UC Berkeley)
David Duvenaud (University of Toronto)
David Duvenaud is an assistant professor in computer science at the University of Toronto. His research focuses on continuous-time models, latent-variable models, and deep learning. His postdoc was done at Harvard University, and his Ph.D. at the University of Cambridge. David also co-founded Invenia, an energy forecasting and trading company.
David Krueger (Montreal Institue for Learning Algorithms)
Anca Dragan (UC Berkeley)
More from the Same Authors
-
2021 Spotlight: Learning Equilibria in Matching Markets from Bandit Feedback »
Meena Jagadeesan · Alexander Wei · Yixin Wang · Michael Jordan · Jacob Steinhardt -
2021 : Measuring Coding Challenge Competence With APPS »
Dan Hendrycks · Steven Basart · Saurav Kadavath · Mantas Mazeika · Akul Arora · Ethan Guo · Collin Burns · Samir Puranik · Horace He · Dawn Song · Jacob Steinhardt -
2021 : PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures »
Dan Hendrycks · Andy Zou · Mantas Mazeika · Leonard Tang · Dawn Song · Jacob Steinhardt -
2021 : Effect of Model Size on Worst-group Generalization »
Alan Pham · Eunice Chan · Vikranth Srivatsa · Dhruba Ghosh · Yaoqing Yang · Yaodong Yu · Ruiqi Zhong · Joseph Gonzalez · Jacob Steinhardt -
2021 : The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models »
Alexander Pan · Kush Bhatia · Jacob Steinhardt -
2021 : What Would Jiminy Cricket Do? Towards Agents That Behave Morally »
Dan Hendrycks · Mantas Mazeika · Andy Zou · Sahil Patel · Christine Zhu · Jesus Navarro · Dawn Song · Bo Li · Jacob Steinhardt -
2021 : Measuring Mathematical Problem Solving With the MATH Dataset »
Dan Hendrycks · Collin Burns · Saurav Kadavath · Akul Arora · Steven Basart · Eric Tang · Dawn Song · Jacob Steinhardt -
2022 : Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations »
Yongyi Yang · Jacob Steinhardt · Wei Hu -
2022 : Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small »
Kevin Wang · Alexandre Variengien · Arthur Conmy · Buck Shlegeris · Jacob Steinhardt -
2022 Workshop: Workshop on Machine Learning Safety »
Dan Hendrycks · Victoria Krakovna · Dawn Song · Jacob Steinhardt · Nicholas Carlini -
2022 Workshop: The Symbiosis of Deep Learning and Differential Equations II »
Michael Poli · Winnie Xu · Estefany Kelly Buchanan · Maryam Hosseini · Luca Celotti · Martin Magill · Ermal Rrapaj · Qiyao Wei · Stefano Massaroli · Patrick Kidger · Archis Joglekar · Animesh Garg · David Duvenaud -
2022 Poster: How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios »
Mantas Mazeika · Eric Tang · Andy Zou · Steven Basart · Jun Shern Chan · Dawn Song · David Forsyth · Jacob Steinhardt · Dan Hendrycks -
2022 Poster: Capturing Failures of Large Language Models via Human Cognitive Biases »
Erik Jones · Jacob Steinhardt -
2022 Poster: Forecasting Future World Events With Neural Networks »
Andy Zou · Tristan Xiao · Ryan Jia · Joe Kwon · Mantas Mazeika · Richard Li · Dawn Song · Jacob Steinhardt · Owain Evans · Dan Hendrycks -
2021 : Dependent Types for Machine Learning in Dex - David Duvenaud - University of Toronto »
David Duvenaud · AIPLANS 2021 -
2021 Poster: Grounding Representation Similarity Through Statistical Testing »
Frances Ding · Jean-Stanislas Denain · Jacob Steinhardt -
2021 Poster: Meta-learning to Improve Pre-training »
Aniruddh Raghu · Jonathan Lorraine · Simon Kornblith · Matthew McDermott · David Duvenaud -
2021 Poster: Learning Equilibria in Matching Markets from Bandit Feedback »
Meena Jagadeesan · Alexander Wei · Yixin Wang · Michael Jordan · Jacob Steinhardt -
2020 : Panel discussion 2 »
Danielle S Bassett · Yoshua Bengio · Cristina Savin · David Duvenaud · Anna Choromanska · Yanping Huang -
2020 : Invited Talk David Duvenaud »
David Duvenaud -
2020 Tutorial: (Track3) Deep Implicit Layers: Neural ODEs, Equilibrium Models, and Differentiable Optimization Q&A »
David Duvenaud · J. Zico Kolter · Matthew Johnson -
2020 Poster: What went wrong and when? Instance-wise feature importance for time-series black-box models »
Sana Tonekaboni · Shalmali Joshi · Kieran Campbell · David Duvenaud · Anna Goldenberg -
2020 Poster: Learning Differential Equations that are Easy to Solve »
Jacob Kelly · Jesse Bettencourt · Matthew Johnson · David Duvenaud -
2020 Tutorial: (Track3) Deep Implicit Layers: Neural ODEs, Equilibrium Models, and Differentiable Optimization »
David Duvenaud · J. Zico Kolter · Matthew Johnson -
2019 Workshop: Program Transformations for ML »
Pascal Lamblin · Atilim Gunes Baydin · Alexander Wiltschko · Bart van Merriënboer · Emily Fertig · Barak Pearlmutter · David Duvenaud · Laurent Hascoet -
2019 : Molecules and Genomes »
David Haussler · Djork-Arné Clevert · Michael Keiser · Alan Aspuru-Guzik · David Duvenaud · David Jones · Jennifer Wei · Alexander D'Amour -
2019 Poster: Latent Ordinary Differential Equations for Irregularly-Sampled Time Series »
Yulia Rubanova · Tian Qi Chen · David Duvenaud -
2019 Poster: Residual Flows for Invertible Generative Modeling »
Tian Qi Chen · Jens Behrmann · David Duvenaud · Joern-Henrik Jacobsen -
2019 Spotlight: Residual Flows for Invertible Generative Modeling »
Tian Qi Chen · Jens Behrmann · David Duvenaud · Joern-Henrik Jacobsen -
2019 Poster: Efficient Graph Generation with Graph Recurrent Attention Networks »
Renjie Liao · Yujia Li · Yang Song · Shenlong Wang · Will Hamilton · David Duvenaud · Raquel Urtasun · Richard Zemel -
2019 Poster: Neural Networks with Cheap Differential Operators »
Tian Qi Chen · David Duvenaud -
2019 Spotlight: Neural Networks with Cheap Differential Operators »
Tian Qi Chen · David Duvenaud -
2018 : Poster Session »
Carl Trimbach · Mennatullah Siam · Rodrigo Toro Icarte · Zhongtian Dai · Sheila McIlraith · Matthew Rahtz · Robert Sheline · Christopher MacLellan · Carolin Lawrence · Stefan Riezler · Dylan Hadfield-Menell · Fang-I Hsiao -
2018 : Software Panel »
Ben Letham · David Duvenaud · Dustin Tran · Aki Vehtari -
2018 Workshop: Workshop on Security in Machine Learning »
Nicolas Papernot · Jacob Steinhardt · Matt Fredrikson · Kamalika Chaudhuri · Florian Tramer -
2018 Poster: Isolating Sources of Disentanglement in Variational Autoencoders »
Tian Qi Chen · Xuechen (Chen) Li · Roger Grosse · David Duvenaud -
2018 Oral: Isolating Sources of Disentanglement in Variational Autoencoders »
Tian Qi Chen · Xuechen (Chen) Li · Roger Grosse · David Duvenaud -
2018 Poster: Neural Ordinary Differential Equations »
Tian Qi Chen · Yulia Rubanova · Jesse Bettencourt · David Duvenaud -
2018 Poster: Semidefinite relaxations for certifying robustness to adversarial examples »
Aditi Raghunathan · Jacob Steinhardt · Percy Liang -
2018 Oral: Neural Ordinary Differential Equations »
Tian Qi Chen · Yulia Rubanova · Jesse Bettencourt · David Duvenaud -
2017 : Opening Remarks »
Dylan Hadfield-Menell -
2017 : Automatic Chemical Design Using a Data-driven Continuous Representation of Molecules »
David Duvenaud -
2017 Workshop: Machine Learning and Computer Security »
Jacob Steinhardt · Nicolas Papernot · Bo Li · Chang Liu · Percy Liang · Dawn Song -
2017 Poster: Inverse Reward Design »
Dylan Hadfield-Menell · Smitha Milli · Pieter Abbeel · Stuart J Russell · Anca Dragan -
2017 Poster: Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference »
Geoffrey Roeder · Yuhuai Wu · David Duvenaud -
2017 Oral: Inverse Reward Design »
Dylan Hadfield-Menell · Smitha Milli · Pieter Abbeel · Stuart J Russell · Anca Dragan -
2017 Poster: Certified Defenses for Data Poisoning Attacks »
Jacob Steinhardt · Pang Wei Koh · Percy Liang -
2016 : Generating Class-conditional Images with Gradient-based Inference »
David Duvenaud -
2016 : David Duvenaud – No more mini-languages: The power of autodiffing full-featured Python »
David Duvenaud -
2016 : Opening Remarks »
Jacob Steinhardt -
2016 Workshop: Reliable Machine Learning in the Wild »
Dylan Hadfield-Menell · Adrian Weller · David Duvenaud · Jacob Steinhardt · Percy Liang -
2016 Poster: Composing graphical models with neural networks for structured representations and fast inference »
Matthew Johnson · David Duvenaud · Alex Wiltschko · Ryan Adams · Sandeep R Datta -
2016 Poster: Cooperative Inverse Reinforcement Learning »
Dylan Hadfield-Menell · Stuart J Russell · Pieter Abbeel · Anca Dragan -
2016 Poster: Probing the Compositionality of Intuitive Functions »
Eric Schulz · Josh Tenenbaum · David Duvenaud · Maarten Speekenbrink · Samuel J Gershman -
2015 : *David Duvenaud* Automatic Differentiation: The most criminally underused tool in probabilistic numerics »
David Duvenaud -
2015 Poster: Convolutional Networks on Graphs for Learning Molecular Fingerprints »
David Duvenaud · Dougal Maclaurin · Jorge Iparraguirre · Rafael Bombarell · Timothy Hirzel · Alan Aspuru-Guzik · Ryan Adams -
2015 Poster: Learning with Relaxed Supervision »
Jacob Steinhardt · Percy Liang -
2014 Poster: Probabilistic ODE Solvers with Runge-Kutta Means »
Michael Schober · David Duvenaud · Philipp Hennig -
2014 Oral: Probabilistic ODE Solvers with Runge-Kutta Means »
Michael Schober · David Duvenaud · Philipp Hennig -
2012 Poster: Active Learning of Model Evidence Using Bayesian Quadrature »
Michael A Osborne · David Duvenaud · Roman Garnett · Carl Edward Rasmussen · Stephen J Roberts · Zoubin Ghahramani -
2011 Poster: Additive Gaussian Processes »
David Duvenaud · Hannes Nickisch · Carl Edward Rasmussen