Timezone: »
Learning models of the environment from data is often viewed as an essential component to building intelligent reinforcement learning (RL) agents. The common practice is to separate the learning of the model from its use, by constructing a model of the environment’s dynamics that correctly predicts the observed state transitions. In this paper we argue that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based planning. As our main contribution, we introduce the principle of value equivalence: two models are value equivalent with respect to a set of functions and policies if they yield the same Bellman updates. We propose a formulation of the model learning problem based on the value equivalence principle and analyze how the set of feasible solutions is impacted by the choice of policies and functions. Specifically, we show that, as we augment the set of policies and functions considered, the class of value equivalent models shrinks, until eventually collapsing to a single point corresponding to a model that perfectly describes the environment. In many problems, directly modelling state-to-state transitions may be both difficult and unnecessary. By leveraging the value-equivalence principle one may find simpler models without compromising performance, saving computation and memory. We illustrate the benefits of value-equivalent model learning with experiments comparing it against more traditional counterparts like maximum likelihood estimation. More generally, we argue that the principle of value equivalence underlies a number of recent empirical successes in RL, such as Value Iteration Networks, the Predictron, Value Prediction Networks, TreeQN, and MuZero, and provides a first theoretical underpinning of those results.
Author Information
Christopher Grimm (University of Michigan)
Andre Barreto (DeepMind)
Satinder Singh (DeepMind)
David Silver (DeepMind)
More from the Same Authors
-
2020 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · Joelle Pineau · David Silver · Satinder Singh · Coline Devin · Misha Laskin · Kimin Lee · Janarthanan Rajendran · Vivek Veeriah -
2020 Poster: Discovering Reinforcement Learning Algorithms »
Junhyuk Oh · Matteo Hessel · Wojciech Czarnecki · Zhongwen Xu · Hado van Hasselt · Satinder Singh · David Silver -
2020 Poster: Value-driven Hindsight Modelling »
Arthur Guez · Fabio Viola · Theophane Weber · Lars Buesing · Steven Kapturowski · Doina Precup · David Silver · Nicolas Heess -
2020 Poster: Meta-Gradient Reinforcement Learning with an Objective Discovered Online »
Zhongwen Xu · Hado van Hasselt · Matteo Hessel · Junhyuk Oh · Satinder Singh · David Silver -
2020 Poster: Learning to Play No-Press Diplomacy with Best Response Policy Iteration »
Thomas Anthony · Tom Eccles · Andrea Tacchetti · János Kramár · Ian Gemp · Thomas Hudson · Nicolas Porcel · Marc Lanctot · Julien Perolat · Richard Everett · Satinder Singh · Thore Graepel · Yoram Bachrach -
2020 Spotlight: Learning to Play No-Press Diplomacy with Best Response Policy Iteration »
Thomas Anthony · Tom Eccles · Andrea Tacchetti · János Kramár · Ian Gemp · Thomas Hudson · Nicolas Porcel · Marc Lanctot · Julien Perolat · Richard Everett · Satinder Singh · Thore Graepel · Yoram Bachrach -
2020 Poster: A Self-Tuning Actor-Critic Algorithm »
Tom Zahavy · Zhongwen Xu · Vivek Veeriah · Matteo Hessel · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh -
2020 Poster: On Efficiency in Hierarchical Reinforcement Learning »
Zheng Wen · Doina Precup · Morteza Ibrahimi · Andre Barreto · Benjamin Van Roy · Satinder Singh -
2020 Spotlight: On Efficiency in Hierarchical Reinforcement Learning »
Zheng Wen · Doina Precup · Morteza Ibrahimi · Andre Barreto · Benjamin Van Roy · Satinder Singh -
2019 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · Joelle Pineau · David Silver · Satinder Singh · Joshua Achiam · Carlos Florensa · Christopher Grimm · Haoran Tang · Vivek Veeriah -
2019 Poster: Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates »
Carlos Riquelme · Hugo Penedones · Damien Vincent · Hartmut Maennel · Sylvain Gelly · Timothy A Mann · Andre Barreto · Gergely Neu -
2019 Poster: Discovery of Useful Questions as Auxiliary Tasks »
Vivek Veeriah · Matteo Hessel · Zhongwen Xu · Janarthanan Rajendran · Richard L Lewis · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh -
2019 Demonstration: The Option Keyboard: Combining Skills in Reinforcement Learning »
Daniel Toyama · Shaobo Hou · Gheorghe Comanici · Andre Barreto · Doina Precup · Shibl Mourad · Eser Aygün · Philippe Hamel -
2019 Poster: The Option Keyboard: Combining Skills in Reinforcement Learning »
Andre Barreto · Diana Borsa · Shaobo Hou · Gheorghe Comanici · Eser Aygün · Philippe Hamel · Daniel Toyama · Jonathan hunt · Shibl Mourad · David Silver · Doina Precup -
2019 Poster: Hindsight Credit Assignment »
Anna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos -
2019 Spotlight: Hindsight Credit Assignment »
Anna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos -
2018 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · David Silver · Satinder Singh · Joelle Pineau · Joshua Achiam · Rein Houthooft · Aravind Srinivas -
2018 Poster: Meta-Gradient Reinforcement Learning »
Zhongwen Xu · Hado van Hasselt · David Silver -
2018 Poster: Fast deep reinforcement learning using online adjustments from the past »
Steven Hansen · Alexander Pritzel · Pablo Sprechmann · Andre Barreto · Charles Blundell -
2017 Symposium: Deep Reinforcement Learning »
Pieter Abbeel · Yan Duan · David Silver · Satinder Singh · Junhyuk Oh · Rein Houthooft -
2017 Poster: Natural Value Approximators: Learning when to Trust Past Estimates »
Zhongwen Xu · Joseph Modayil · Hado van Hasselt · Andre Barreto · David Silver · Tom Schaul -
2017 Poster: Successor Features for Transfer in Reinforcement Learning »
Andre Barreto · Will Dabney · Remi Munos · Jonathan Hunt · Tom Schaul · David Silver · Hado van Hasselt -
2017 Poster: A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning »
Marc Lanctot · Vinicius Zambaldi · Audrunas Gruslys · Angeliki Lazaridou · Karl Tuyls · Julien Perolat · David Silver · Thore Graepel -
2017 Poster: Imagination-Augmented Agents for Deep Reinforcement Learning »
Sébastien Racanière · Theophane Weber · David Reichert · Lars Buesing · Arthur Guez · Danilo Jimenez Rezende · Adrià Puigdomènech Badia · Oriol Vinyals · Nicolas Heess · Yujia Li · Razvan Pascanu · Peter Battaglia · Demis Hassabis · David Silver · Daan Wierstra -
2017 Spotlight: Successor Features for Transfer in Reinforcement Learning »
Andre Barreto · Will Dabney · Remi Munos · Jonathan Hunt · Tom Schaul · David Silver · Hado van Hasselt -
2017 Spotlight: Natural Value Approximators: Learning when to Trust Past Estimates »
Zhongwen Xu · Joseph Modayil · Hado van Hasselt · Andre Barreto · David Silver · Tom Schaul -
2017 Oral: Imagination-Augmented Agents for Deep Reinforcement Learning »
Sébastien Racanière · Theophane Weber · David Reichert · Lars Buesing · Arthur Guez · Danilo Jimenez Rezende · Adrià Puigdomènech Badia · Oriol Vinyals · Nicolas Heess · Yujia Li · Razvan Pascanu · Peter Battaglia · Demis Hassabis · David Silver · Daan Wierstra -
2016 Poster: Learning values across many orders of magnitude »
Hado van Hasselt · Arthur Guez · Arthur Guez · Matteo Hessel · Volodymyr Mnih · David Silver -
2015 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · John Schulman · Satinder Singh · David Silver -
2015 Poster: Learning Continuous Control Policies by Stochastic Value Gradients »
Nicolas Heess · Gregory Wayne · David Silver · Timothy Lillicrap · Tom Erez · Yuval Tassa -
2014 Workshop: Novel Trends and Applications in Reinforcement Learning »
Csaba Szepesvari · Marc Deisenroth · Sergey Levine · Pedro Ortega · Brian Ziebart · Emma Brunskill · Naftali Tishby · Gerhard Neumann · Daniel Lee · Sridhar Mahadevan · Pieter Abbeel · David Silver · Vicenç Gómez -
2014 Poster: Bayes-Adaptive Simulation-based Search with Value Function Approximation »
Arthur Guez · Nicolas Heess · David Silver · Peter Dayan -
2012 Poster: Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search »
Arthur Guez · David Silver · Peter Dayan -
2012 Poster: On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization »
Andre S Barreto · Doina Precup · Joelle Pineau -
2011 Poster: Reinforcement Learning using Kernel-Based Stochastic Factorization »
Andre S Barreto · Doina Precup · Joelle Pineau -
2010 Poster: Monte-Carlo Planning in Large POMDPs »
David Silver · Joel Veness -
2009 Poster: Bootstrapping from Game Tree Search »
Joel Veness · David Silver · William Uther · Alan Blair -
2009 Oral: Bootstrapping from Game Tree Search »
Joel Veness · David Silver · William Uther · Alan Blair -
2009 Poster: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation »
Hamid R Maei · Csaba Szepesvari · Shalabh Batnaghar · Doina Precup · David Silver · Richard Sutton -
2009 Spotlight: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation »
Hamid R Maei · Csaba Szepesvari · Shalabh Batnaghar · Doina Precup · David Silver · Richard Sutton