Timezone: »
Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the task) as well as what not to do (side effects that should be avoided while completing the task). To alleviate the burden on the reward designer, we propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects. This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task. The future task reward can also give the agent an incentive to interfere with events in the environment that make future tasks less achievable, such as irreversible actions by other agents. To avoid this interference incentive, we introduce a baseline policy that represents a default course of action (such as doing nothing), and use it to filter out future tasks that are not achievable by default. We formally define interference incentives and show that the future task approach with a baseline policy avoids these incentives in the deterministic case. Using gridworld environments that test for side effects and interference, we show that our method avoids interference and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.
Author Information
Victoria Krakovna (DeepMind)
Laurent Orseau (DeepMind)
Richard Ngo (University of Cambridge)
Miljan Martic (DeepMind)
Shane Legg (DeepMind)
More from the Same Authors
-
2022 Workshop: Workshop on Machine Learning Safety »
Dan Hendrycks · Victoria Krakovna · Dawn Song · Jacob Steinhardt · Nicholas Carlini -
2021 : Artificial what? »
Shane Legg -
2020 Poster: Meta-trained agents implement Bayes-optimal agents »
Vladimir Mikulik · Grégoire Delétang · Tom McGrath · Tim Genewein · Miljan Martic · Shane Legg · Pedro Ortega -
2020 Spotlight: Meta-trained agents implement Bayes-optimal agents »
Vladimir Mikulik · Grégoire Delétang · Tom McGrath · Tim Genewein · Miljan Martic · Shane Legg · Pedro Ortega -
2020 Poster: Logarithmic Pruning is All You Need »
Laurent Orseau · Marcus Hutter · Omar Rivasplata -
2020 Spotlight: Logarithmic Pruning is All You Need »
Laurent Orseau · Marcus Hutter · Omar Rivasplata -
2018 Poster: Single-Agent Policy Tree Search With Guarantees »
Laurent Orseau · Levi Lelis · Tor Lattimore · Theophane Weber -
2018 Poster: Reward learning from human preferences and demonstrations in Atari »
Borja Ibarz · Jan Leike · Tobias Pohlen · Geoffrey Irving · Shane Legg · Dario Amodei -
2017 Poster: Deep Reinforcement Learning from Human Preferences »
Paul Christiano · Jan Leike · Tom Brown · Miljan Martic · Shane Legg · Dario Amodei -
2007 Poster: Temporal Difference with Eligibility Traces Derived from First Principles »
Marcus Hutter · Shane Legg