Timezone: »
AI objectives are often hard to specify properly. Some approaches tackle thisproblem by regularizing the AI’s side effects: Agents must weigh off “how muchof a mess they make” with an imperfectly specified proxy objective. We propose aformal criterion for side effect regularization via the assistance game framework[Shah et al., 2021]. In these games, the agent solves a partially observable Markovdecision process (POMDP) representing its uncertainty about the objective functionit should optimize. We consider the setting where the true objective is revealedto the agent at a later time step. We show that this POMDP is solved by tradingoff the proxy reward with the agent’s ability to achieve a range of future tasks.We empirically demonstrate the reasonableness of our problem formalization viaground-truth evaluation in two gridworld environments.
Author Information
Alex Turner (Oregon State University)
Aseem Saxena (Oregon State University)

https://aseembits93.github.io/
Prasad Tadepalli (Oregon State University)
More from the Same Authors
-
2021 Spotlight: Optimal Policies Tend To Seek Power »
Alex Turner · Logan Smith · Rohin Shah · Andrew Critch · Prasad Tadepalli -
2022 Poster: Parametrically Retargetable Decision-Makers Tend To Seek Power »
Alex Turner · Prasad Tadepalli -
2021 Poster: Optimal Policies Tend To Seek Power »
Alex Turner · Logan Smith · Rohin Shah · Andrew Critch · Prasad Tadepalli -
2020 Poster: Avoiding Side Effects in Complex Environments »
Alex Turner · Neale Ratzlaff · Prasad Tadepalli -
2020 Spotlight: Avoiding Side Effects in Complex Environments »
Alex Turner · Neale Ratzlaff · Prasad Tadepalli