Timezone: »
Some researchers speculate that intelligent reinforcement learning (RL) agents would be incentivized to seek resources and power in pursuit of the objectives we specify for them. Other researchers point out that RL agents need not have human-like power-seeking instincts. To clarify this discussion, we develop the first formal theory of the statistical tendencies of optimal policies. In the context of Markov decision processes, we prove that certain environmental symmetries are sufficient for optimal policies to tend to seek power over the environment. These symmetries exist in many environments in which the agent can be shut down or destroyed. We prove that in these environments, most reward functions make it optimal to seek power by keeping a range of options available and, when maximizing average reward, by navigating towards larger sets of potential terminal states.
Author Information
Alex Turner (Oregon State University)
Logan Smith (MSU)
Rohin Shah (DeepMind)
Rohin is a Research Scientist on the technical AGI safety team at DeepMind. He completed his PhD at the Center for Human-Compatible AI at UC Berkeley, where he worked on building AI systems that can learn to assist a human user, even if they don't initially know what the user wants. He is particularly interested in big picture questions about artificial intelligence. What techniques will we use to build human-level AI systems? How will their deployment affect the world? What can we do to make this deployment go better? He writes up summaries and thoughts about recent work tackling these questions in the Alignment Newsletter.
Andrew Critch (UC Berkeley)
Prasad Tadepalli (Oregon State University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Optimal Policies Tend To Seek Power »
Dates n/a. Room
More from the Same Authors
-
2021 : An Empirical Investigation of Representation Learning for Imitation »
Cynthia Chen · Sam Toyer · Cody Wild · Scott Emmons · Ian Fischer · Kuang-Huei Lee · Neel Alex · Steven Wang · Ping Luo · Stuart Russell · Pieter Abbeel · Rohin Shah -
2021 : Deep RePReL--Combining Planning and Deep RL for acting in relational domains »
Harsha Kokel · Arjun Manoharan · Sriraam Natarajan · Balaraman Ravindran · Prasad Tadepalli -
2022 : Formalizing the Problem of Side Effect Regularization »
Alex Turner · Aseem Saxena · Prasad Tadepalli -
2022 Poster: Parametrically Retargetable Decision-Makers Tend To Seek Power »
Alex Turner · Prasad Tadepalli -
2021 : NeurIPS RL Competitions Results Presentations »
Rohin Shah · Liam Paull · Tabitha Lee · Tim Rocktäschel · Heinrich Küttler · Sharada Mohanty · Manuel Wuethrich -
2021 : BASALT: A MineRL Competition on Solving Human-Judged Task + Q&A »
Rohin Shah · Cody Wild · Steven Wang · Neel Alex · Brandon Houghton · William Guss · Sharada Mohanty · Stephanie Milani · Nicholay Topin · Pieter Abbeel · Stuart Russell · Anca Dragan -
2021 : Diamond: A MineRL Competition on Training Sample-Efficient Agents + Q&A »
William Guss · Alara Dirik · Byron Galbraith · Brandon Houghton · Anssi Kanervisto · Noboru Kuno · Stephanie Milani · Sharada Mohanty · Karolis Ramanauskas · Ruslan Salakhutdinov · Rohin Shah · Nicholay Topin · Steven Wang · Cody Wild -
2021 Poster: One Explanation is Not Enough: Structured Attention Graphs for Image Classification »
Vivswan Shitole · Fuxin Li · Minsuk Kahng · Prasad Tadepalli · Alan Fern -
2020 : Spotlight Talk: Benefits of Assistance over Reward Learning »
Rohin Shah -
2020 Poster: Avoiding Side Effects in Complex Environments »
Alex Turner · Neale Ratzlaff · Prasad Tadepalli -
2020 Spotlight: Avoiding Side Effects in Complex Environments »
Alex Turner · Neale Ratzlaff · Prasad Tadepalli -
2020 Poster: The MAGICAL Benchmark for Robust Imitation »
Sam Toyer · Rohin Shah · Andrew Critch · Stuart Russell -
2020 Poster: Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design »
Michael Dennis · Natasha Jaques · Eugene Vinitsky · Alexandre Bayen · Stuart Russell · Andrew Critch · Sergey Levine -
2020 Oral: Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design »
Michael Dennis · Natasha Jaques · Eugene Vinitsky · Alexandre Bayen · Stuart Russell · Andrew Critch · Sergey Levine -
2019 Poster: On the Utility of Learning about Humans for Human-AI Coordination »
Micah Carroll · Rohin Shah · Mark Ho · Tom Griffiths · Sanjit Seshia · Pieter Abbeel · Anca Dragan -
2018 Poster: Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making »
Nishant Desai · Andrew Critch · Stuart J Russell -
2013 Poster: Symbolic Opportunistic Policy Iteration for Factored-Action MDPs »
Aswin Raghavan · Roni Khardon · Alan Fern · Prasad Tadepalli -
2012 Poster: A Bayesian Approach for Policy Learning from Trajectory Preference Queries »
Aaron Wilson · Alan Fern · Prasad Tadepalli -
2011 Poster: Autonomous Learning of Action Models for Planning »
Neville Mehta · Prasad Tadepalli · Alan Fern -
2011 Poster: Inverting Grice's Maxims to Learn Rules from Natural Language Extractions »
M. Shahed Sorower · Thomas Dietterich · Janardhan Rao Doppa · Walker Orr · Prasad Tadepalli · Xiaoli Fern -
2010 Poster: A Computational Decision Theory for Interactive Assistants »
Alan Fern · Prasad Tadepalli