Timezone: »
If capable AI agents are generally incentivized to seek power in service of the objectives we specify for them, then these systems will pose enormous risks, in addition to enormous benefits. In fully observable environments, most reward functions have an optimal policy which seeks power by keeping options open and staying alive. However, the real world is neither fully observable, nor must trained agents be even approximately reward-optimal. We consider a range of models of AI decision-making, from optimal, to random, to choices informed by learning and interacting with an environment. We discover that many decision-making functions are retargetable, and that retargetability is sufficient to cause power-seeking tendencies. Our functional criterion is simple and broad. We show that a range of qualitatively dissimilar decision-making procedures incentivize agents to seek power. We demonstrate the flexibility of our results by reasoning about learned policy incentives in Montezuma's Revenge. These results suggest a safety risk: Eventually, retargetable training procedures may train real-world agents which seek power over humans.
Author Information
Alex Turner (Oregon State University)
Prasad Tadepalli (Oregon State University)
More from the Same Authors
-
2021 Spotlight: Optimal Policies Tend To Seek Power »
Alex Turner · Logan Smith · Rohin Shah · Andrew Critch · Prasad Tadepalli -
2021 : Deep RePReL--Combining Planning and Deep RL for acting in relational domains »
Harsha Kokel · Arjun Manoharan · Sriraam Natarajan · Balaraman Ravindran · Prasad Tadepalli -
2022 : Formalizing the Problem of Side Effect Regularization »
Alex Turner · Aseem Saxena · Prasad Tadepalli -
2021 Poster: One Explanation is Not Enough: Structured Attention Graphs for Image Classification »
Vivswan Shitole · Fuxin Li · Minsuk Kahng · Prasad Tadepalli · Alan Fern -
2021 Poster: Optimal Policies Tend To Seek Power »
Alex Turner · Logan Smith · Rohin Shah · Andrew Critch · Prasad Tadepalli -
2020 Poster: Avoiding Side Effects in Complex Environments »
Alex Turner · Neale Ratzlaff · Prasad Tadepalli -
2020 Spotlight: Avoiding Side Effects in Complex Environments »
Alex Turner · Neale Ratzlaff · Prasad Tadepalli -
2013 Poster: Symbolic Opportunistic Policy Iteration for Factored-Action MDPs »
Aswin Raghavan · Roni Khardon · Alan Fern · Prasad Tadepalli -
2012 Poster: A Bayesian Approach for Policy Learning from Trajectory Preference Queries »
Aaron Wilson · Alan Fern · Prasad Tadepalli -
2011 Poster: Autonomous Learning of Action Models for Planning »
Neville Mehta · Prasad Tadepalli · Alan Fern -
2011 Poster: Inverting Grice's Maxims to Learn Rules from Natural Language Extractions »
M. Shahed Sorower · Thomas Dietterich · Janardhan Rao Doppa · Walker Orr · Prasad Tadepalli · Xiaoli Fern -
2010 Poster: A Computational Decision Theory for Interactive Assistants »
Alan Fern · Prasad Tadepalli