Timezone: »
Bayesian model-based reinforcement learning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, finding the resulting Bayes-optimal policies is notoriously taxing, since the search space becomes enormous. In this paper we introduce a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our approach outperformed prior Bayesian model-based RL algorithms by a significant margin on several well-known benchmark problems -- because it avoids expensive applications of Bayes rule within the search tree by lazily sampling models from the current beliefs. We illustrate the advantages of our approach by showing it working in an infinite state space domain which is qualitatively out of reach of almost all previous work in Bayesian exploration.
Author Information
Arthur Guez (DeepMind)
David Silver (DeepMind)
Peter Dayan (Gatsby Unit, UCL)
I am Director of the Gatsby Computational Neuroscience Unit at University College London. I studied mathematics at the University of Cambridge and then did a PhD at the University of Edinburgh, specialising in associative memory and reinforcement learning. I did postdocs with Terry Sejnowski at the Salk Institute and Geoff Hinton at the University of Toronto, then became an Assistant Professor in Brain and Cognitive Science at the Massachusetts Institute of Technology before moving to UCL.
More from the Same Authors
-
2021 Spotlight: Proper Value Equivalence »
Christopher Grimm · Andre Barreto · Greg Farquhar · David Silver · Satinder Singh -
2021 Spotlight: Online and Offline Reinforcement Learning by Planning with a Learned Model »
Julian Schrittwieser · Thomas Hubert · Amol Mandhane · Mohammadamin Barekatain · Ioannis Antonoglou · David Silver -
2022 Poster: Large-Scale Retrieval for Reinforcement Learning »
Peter Humphreys · Arthur Guez · Olivier Tieleman · Laurent Sifre · Theophane Weber · Timothy Lillicrap -
2021 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · David Silver · Matthew Taylor · Martha White · Srijita Das · Yuqing Du · Andrew Patterson · Manan Tomar · Olivia Watkins -
2021 : Bootstrapped Meta-Learning »
Sebastian Flennerhag · Yannick Schroecker · Tom Zahavy · Hado van Hasselt · David Silver · Satinder Singh -
2021 Poster: Proper Value Equivalence »
Christopher Grimm · Andre Barreto · Greg Farquhar · David Silver · Satinder Singh -
2021 Poster: Discovery of Options via Meta-Learned Subgoals »
Vivek Veeriah · Tom Zahavy · Matteo Hessel · Zhongwen Xu · Junhyuk Oh · Iurii Kemaev · Hado van Hasselt · David Silver · Satinder Singh -
2021 Poster: Self-Consistent Models and Values »
Greg Farquhar · Kate Baumli · Zita Marinho · Angelos Filos · Matteo Hessel · Hado van Hasselt · David Silver -
2021 Poster: Online and Offline Reinforcement Learning by Planning with a Learned Model »
Julian Schrittwieser · Thomas Hubert · Amol Mandhane · Mohammadamin Barekatain · Ioannis Antonoglou · David Silver -
2020 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · Joelle Pineau · David Silver · Satinder Singh · Coline Devin · Misha Laskin · Kimin Lee · Janarthanan Rajendran · Vivek Veeriah -
2020 Poster: Discovering Reinforcement Learning Algorithms »
Junhyuk Oh · Matteo Hessel · Wojciech Czarnecki · Zhongwen Xu · Hado van Hasselt · Satinder Singh · David Silver -
2020 Poster: Value-driven Hindsight Modelling »
Arthur Guez · Fabio Viola · Theophane Weber · Lars Buesing · Steven Kapturowski · Doina Precup · David Silver · Nicolas Heess -
2020 Poster: Meta-Gradient Reinforcement Learning with an Objective Discovered Online »
Zhongwen Xu · Hado van Hasselt · Matteo Hessel · Junhyuk Oh · Satinder Singh · David Silver -
2020 Poster: A Self-Tuning Actor-Critic Algorithm »
Tom Zahavy · Zhongwen Xu · Vivek Veeriah · Matteo Hessel · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh -
2020 Poster: The Value Equivalence Principle for Model-Based Reinforcement Learning »
Christopher Grimm · Andre Barreto · Satinder Singh · David Silver -
2019 : Late-Breaking Papers (Talks) »
David Silver · Simon Du · Matthias Plappert -
2019 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · Joelle Pineau · David Silver · Satinder Singh · Joshua Achiam · Carlos Florensa · Christopher Grimm · Haoran Tang · Vivek Veeriah -
2019 Poster: Discovery of Useful Questions as Auxiliary Tasks »
Vivek Veeriah · Matteo Hessel · Zhongwen Xu · Janarthanan Rajendran · Richard L Lewis · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh -
2019 Poster: The Option Keyboard: Combining Skills in Reinforcement Learning »
Andre Barreto · Diana Borsa · Shaobo Hou · Gheorghe Comanici · Eser Aygün · Philippe Hamel · Daniel Toyama · jonathan j hunt · Shibl Mourad · David Silver · Doina Precup -
2018 : David Silver »
David Silver -
2018 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · David Silver · Satinder Singh · Joelle Pineau · Joshua Achiam · Rein Houthooft · Aravind Srinivas -
2018 Poster: Integrated accounts of behavioral and neuroimaging data using flexible recurrent neural network models »
Amir Dezfouli · Richard Morris · Fabio Ramos · Peter Dayan · Bernard Balleine -
2018 Oral: Integrated accounts of behavioral and neuroimaging data using flexible recurrent neural network models »
Amir Dezfouli · Richard Morris · Fabio Ramos · Peter Dayan · Bernard Balleine -
2018 Poster: Meta-Gradient Reinforcement Learning »
Zhongwen Xu · Hado van Hasselt · David Silver -
2017 : Panel Discussion »
Matt Botvinick · Emma Brunskill · Marcos Campos · Jan Peters · Doina Precup · David Silver · Josh Tenenbaum · Roy Fox -
2017 : Deep Reinforcement Learning with Subgoals (David Silver) »
David Silver -
2017 Symposium: Deep Reinforcement Learning »
Pieter Abbeel · Yan Duan · David Silver · Satinder Singh · Junhyuk Oh · Rein Houthooft -
2017 Poster: Natural Value Approximators: Learning when to Trust Past Estimates »
Zhongwen Xu · Joseph Modayil · Hado van Hasselt · Andre Barreto · David Silver · Tom Schaul -
2017 Poster: Successor Features for Transfer in Reinforcement Learning »
Andre Barreto · Will Dabney · Remi Munos · Jonathan Hunt · Tom Schaul · David Silver · Hado van Hasselt -
2017 Poster: A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning »
Marc Lanctot · Vinicius Zambaldi · Audrunas Gruslys · Angeliki Lazaridou · Karl Tuyls · Julien Perolat · David Silver · Thore Graepel -
2017 Poster: Imagination-Augmented Agents for Deep Reinforcement Learning »
Sébastien Racanière · Theophane Weber · David Reichert · Lars Buesing · Arthur Guez · Danilo Jimenez Rezende · Adrià Puigdomènech Badia · Oriol Vinyals · Nicolas Heess · Yujia Li · Razvan Pascanu · Peter Battaglia · Demis Hassabis · David Silver · Daan Wierstra -
2017 Spotlight: Successor Features for Transfer in Reinforcement Learning »
Andre Barreto · Will Dabney · Remi Munos · Jonathan Hunt · Tom Schaul · David Silver · Hado van Hasselt -
2017 Spotlight: Natural Value Approximators: Learning when to Trust Past Estimates »
Zhongwen Xu · Joseph Modayil · Hado van Hasselt · Andre Barreto · David Silver · Tom Schaul -
2017 Oral: Imagination-Augmented Agents for Deep Reinforcement Learning »
Sébastien Racanière · Theophane Weber · David Reichert · Lars Buesing · Arthur Guez · Danilo Jimenez Rezende · Adrià Puigdomènech Badia · Oriol Vinyals · Nicolas Heess · Yujia Li · Razvan Pascanu · Peter Battaglia · Demis Hassabis · David Silver · Daan Wierstra -
2016 Poster: Learning values across many orders of magnitude »
Hado van Hasselt · Arthur Guez · Arthur Guez · Matteo Hessel · Volodymyr Mnih · David Silver -
2015 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · John Schulman · Satinder Singh · David Silver -
2015 Poster: Learning Continuous Control Policies by Stochastic Value Gradients »
Nicolas Heess · Gregory Wayne · David Silver · Timothy Lillicrap · Tom Erez · Yuval Tassa -
2014 Workshop: Novel Trends and Applications in Reinforcement Learning »
Csaba Szepesvari · Marc Deisenroth · Sergey Levine · Pedro Ortega · Brian Ziebart · Emma Brunskill · Naftali Tishby · Gerhard Neumann · Daniel Lee · Sridhar Mahadevan · Pieter Abbeel · David Silver · Vicenç Gómez -
2014 Poster: Bayes-Adaptive Simulation-based Search with Value Function Approximation »
Arthur Guez · Nicolas Heess · David Silver · Peter Dayan -
2013 Invited Talk: Neural Reinforcement Learning »
Peter Dayan -
2013 Poster: Correlations strike back (again): the case of associative memory retrieval »
Cristina Savin · Peter Dayan · Mate Lengyel -
2013 Oral: Correlations strike back (again): the case of associative memory retrieval »
Cristina Savin · Peter Dayan · Mate Lengyel -
2011 Poster: Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories »
Cristina Savin · Peter Dayan · Mate Lengyel -
2010 Poster: Monte-Carlo Planning in Large POMDPs »
David Silver · Joel Veness -
2009 Poster: Bootstrapping from Game Tree Search »
Joel Veness · David Silver · William Uther · Alan Blair -
2009 Poster: Know Thy Neighbour: A Normative Theory of Synaptic Depression »
Jean-Pascal Pfister · Peter Dayan · Mate Lengyel -
2009 Oral: Bootstrapping from Game Tree Search »
Joel Veness · David Silver · William Uther · Alan Blair -
2009 Oral: Know Thy Neighbour: A Normative Theory of Synaptic Depression »
Jean-Pascal Pfister · Peter Dayan · Mate Lengyel -
2009 Poster: Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing »
Ruben Coen-Cagli · Peter Dayan · Odelia Schwartz -
2009 Spotlight: Statistical Models of Linear and Nonlinear Contextual Interactions in Early Visual Processing »
Ruben Coen-Cagli · Peter Dayan · Odelia Schwartz -
2009 Poster: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation »
Hamid R Maei · Csaba Szepesvari · Shalabh Batnaghar · Doina Precup · David Silver · Richard Sutton -
2009 Spotlight: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation »
Hamid R Maei · Csaba Szepesvari · Shalabh Batnaghar · Doina Precup · David Silver · Richard Sutton -
2008 Oral: Load and Attentional Bayes »
Peter Dayan -
2008 Poster: Load and Attentional Bayes »
Peter Dayan -
2008 Poster: Depression: an RL formulation and a behavioural test »
Quentin J Huys · Joshua T Vogelstein · Peter Dayan -
2008 Poster: Bayesian Model of Behaviour in Economic Games »
Debajyoti Ray · Brooks King-Casas · P. Read Montague · Peter Dayan -
2007 Oral: Hippocampal Contributions to Control: The Third Way »
Mate Lengyel · Peter Dayan -
2007 Poster: Hippocampal Contributions to Control: The Third Way »
Mate Lengyel · Peter Dayan -
2006 Poster: Uncertainty, phase and oscillatory hippocampal recall »
Mate Lengyel · Peter Dayan -
2006 Talk: Uncertainty, phase and oscillatory hippocampal recall »
Mate Lengyel · Peter Dayan