Timezone: »

Aligned Artificial Intelligence
Dylan Hadfield-Menell · Jacob Steinhardt · David Duvenaud · David Krueger · Anca Dragan

Sat Dec 09 08:00 AM -- 06:30 PM (PST) @ 201 B
Event URL: https://sites.google.com/site/alignedainips2017/ »

In order to be helpful to users and to society at large, an autonomous agent needs to be aligned with the objectives of its stakeholders. Misaligned incentives are a common and crucial problem with human agents --- we should expect similar challenges to arise from misaligned incentives with artificial agents. For example, it is not uncommon to see reinforcement learning agents ‘hack’ their specified reward function. How do we build learning systems that will reliably achieve a user's intended objective? How can we ensure that autonomous agents behave reliably in unforeseen situations? How do we design systems whose behavior will be aligned with the values and goals of society at large? As AI capabilities develop, it is crucial for the AI community to come to satisfying and trustworthy answers to these questions. This workshop will focus on three central challenges in value alignment: learning complex rewards that reflect human preferences (e.g. meaningful oversight, preference elicitation, inverse reinforcement learning, learning from demonstration or feedback), engineering reliable AI systems (e.g. robustness to distributional shift, model misspecification, or adversarial data, via methods such as adversarial training, KWIK-style learning, or transparency to human inspection), and dealing with bounded rationality and incomplete information in both AI systems and their users (e.g. acting on incomplete task specifications, learning from users who sometimes make mistakes). We also welcome submissions that do not directly fit these categories but generally deal with problems relating to value alignment in artificial intelligence.

Sat 9:15 a.m. - 9:30 a.m.
Opening Remarks (Talk)
Dylan Hadfield-Menell
Sat 9:30 a.m. - 9:35 a.m.
Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning (Talk)
Hadrien Hendrikx
Sat 9:45 a.m. - 10:15 a.m.
Minimax-Regret Querying on Side Effects in Factored Markov Decision Processes (Talk)
Satinder Singh
Sat 10:15 a.m. - 10:30 a.m.
Robust Covariate Shift with Exact Loss Functions (Contributed Talk)
Angie Liu
Sat 11:00 a.m. - 11:30 a.m.
Adversarial Robustness for Aligned AI (Talk)
Ian Goodfellow
Sat 11:30 a.m. - 12:00 p.m.
Incomplete Contracting and AI Alignment (Talk)
Gillian Hadfield
Sat 1:15 p.m. - 1:45 p.m.
Learning from Human Feedback (Talk)
Paul Christiano
Sat 1:45 p.m. - 2:00 p.m.
Finite Supervision Reinforcement Learning (Contributed Talk)
William Saunders, Eric Langlois
Sat 2:00 p.m. - 2:15 p.m.
Safer Classification by Synthesis (Contributed Talk)
William Wang
Sat 2:15 p.m. - 3:00 p.m.
Aligned AI Poster Session (Poster Session)
Amanda Askell, Rafal Muszynski, William Wang, Yaodong Yang, Quoc Nguyen, Bryan Kian Hsiang Low, Patrick Jaillet, Candice Schumann, Angie Liu, Peter Eckersley, Angelina Wang, William Saunders
Sat 3:30 p.m. - 4:00 p.m.
Machine Learning for Human Deliberative Judgment (Talk)
Owain Evans
Sat 4:00 p.m. - 4:30 p.m.
Learning Reward Functions (Talk)
Jan Leike
Sat 4:30 p.m. - 5:00 p.m.
Informal Technical Discussion: Open Problems in AI Alignment (Discussion)

Author Information

Dylan Hadfield-Menell (UC Berkeley)
Jacob Steinhardt (UC Berkeley)
David Duvenaud (University of Toronto)

David Duvenaud is an assistant professor in computer science at the University of Toronto. His research focuses on continuous-time models, latent-variable models, and deep learning. His postdoc was done at Harvard University, and his Ph.D. at the University of Cambridge. David also co-founded Invenia, an energy forecasting and trading company.

David Krueger (Montreal Institue for Learning Algorithms)
Anca Dragan (UC Berkeley)

More from the Same Authors