Timezone: »
The Bayes-Adaptive Markov Decision Process (BAMDP) formalism pursues the Bayes-optimal solution to the exploration-exploitation trade-off in reinforcement learning. As the computation of exact solutions to Bayesian reinforcement-learning problems is intractable, much of the literature has focused on developing suitable approximation algorithms. In this work, before diving into algorithm design, we first define, under mild structural assumptions, a complexity measure for BAMDP planning. As efficient exploration in BAMDPs hinges upon the judicious acquisition of information, our complexity measure highlights the worst-case difficulty of gathering information and exhausting epistemic uncertainty. To illustrate its significance, we establish a computationally-intractable, exact planning algorithm that takes advantage of this measure to show more efficient planning. We then conclude by introducing a specific form of state abstraction with the potential to reduce BAMDP complexity and gives rise to a computationally-tractable, approximate planning algorithm.
Author Information
Dilip Arumugam (Stanford University)
Satinder Singh (DeepMind)
More from the Same Authors
-
2021 Spotlight: Proper Value Equivalence »
Christopher Grimm · Andre Barreto · Greg Farquhar · David Silver · Satinder Singh -
2021 Spotlight: Reward is enough for convex MDPs »
Tom Zahavy · Brendan O'Donoghue · Guillaume Desjardins · Satinder Singh -
2021 : GrASP: Gradient-Based Affordance Selection for Planning »
Vivek Veeriah · Zeyu Zheng · Richard L Lewis · Satinder Singh -
2022 : In-Context Policy Iteration »
Ethan Brooks · Logan Walls · Richard L Lewis · Satinder Singh -
2022 : In-context Reinforcement Learning with Algorithm Distillation »
Michael Laskin · Luyu Wang · Junhyuk Oh · Emilio Parisotto · Stephen Spencer · Richie Steigerwald · DJ Strouse · Steven Hansen · Angelos Filos · Ethan Brooks · Maxime Gazeau · Himanshu Sahni · Satinder Singh · Volodymyr Mnih -
2022 : On Rate-Distortion Theory in Capacity-Limited Cognition & Reinforcement Learning »
Dilip Arumugam · Mark Ho · Noah Goodman · Benjamin Van Roy -
2022 : Optimistic Meta-Gradients »
Sebastian Flennerhag · Tom Zahavy · Brendan O'Donoghue · Hado van Hasselt · András György · Satinder Singh -
2022 : In the ZONE: Measuring difficulty and progression in curriculum generation »
Rose Wang · Jesse Mu · Dilip Arumugam · Natasha Jaques · Noah Goodman -
2022 : In-context Reinforcement Learning with Algorithm Distillation »
Michael Laskin · Luyu Wang · Junhyuk Oh · Emilio Parisotto · Stephen Spencer · Richie Steigerwald · DJ Strouse · Steven Hansen · Angelos Filos · Ethan Brooks · Maxime Gazeau · Himanshu Sahni · Satinder Singh · Volodymyr Mnih -
2023 Poster: Optimistic Meta-Gradients »
Sebastian Flennerhag · Tom Zahavy · Brendan O'Donoghue · Hado van Hasselt · András György · Satinder Singh -
2023 Poster: A Definition of Continual Reinforcement Learning »
David Abel · Andre Barreto · Benjamin Van Roy · Doina Precup · Hado van Hasselt · Satinder Singh -
2023 Poster: Large Language Models can Implement Policy Iteration »
Ethan Brooks · Logan Walls · Richard L Lewis · Satinder Singh -
2023 Poster: Discovering Representations for Transfer with Successor Features and the Deep Option Keyboard »
Wilka Carvalho Carvalho · Andre Saraiva · Angelos Filos · Andrew Lampinen · Loic Matthey · Richard L Lewis · Honglak Lee · Satinder Singh · Danilo Jimenez Rezende · Daniel Zoran -
2023 Poster: Structured State Space Models for In-Context Reinforcement Learning »
Chris Lu · Yannick Schroecker · Albert Gu · Emilio Parisotto · Jakob Foerster · Satinder Singh · Feryal Behbahani -
2022 Poster: Palm up: Playing in the Latent Manifold for Unsupervised Pretraining »
Hao Liu · Tom Zahavy · Volodymyr Mnih · Satinder Singh -
2022 Poster: Approximate Value Equivalence »
Christopher Grimm · Andre Barreto · Satinder Singh -
2022 Poster: Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning »
Dilip Arumugam · Benjamin Van Roy -
2021 : Reducing the Information Horizon of Bayes-Adaptive Markov Decision Processes via Epistemic State Abstraction »
Dilip Arumugam · Satinder Singh -
2021 : Bootstrapped Meta-Learning »
Sebastian Flennerhag · Yannick Schroecker · Tom Zahavy · Hado van Hasselt · David Silver · Satinder Singh -
2021 Poster: On the Expressivity of Markov Reward »
David Abel · Will Dabney · Anna Harutyunyan · Mark Ho · Michael Littman · Doina Precup · Satinder Singh -
2021 Poster: The Value of Information When Deciding What to Learn »
Dilip Arumugam · Benjamin Van Roy -
2021 Poster: Reward is enough for convex MDPs »
Tom Zahavy · Brendan O'Donoghue · Guillaume Desjardins · Satinder Singh -
2021 Poster: Proper Value Equivalence »
Christopher Grimm · Andre Barreto · Greg Farquhar · David Silver · Satinder Singh -
2021 Poster: Discovery of Options via Meta-Learned Subgoals »
Vivek Veeriah · Tom Zahavy · Matteo Hessel · Zhongwen Xu · Junhyuk Oh · Iurii Kemaev · Hado van Hasselt · David Silver · Satinder Singh -
2021 Poster: Learning State Representations from Random Deep Action-conditional Predictions »
Zeyu Zheng · Vivek Veeriah · Risto Vuorio · Richard L Lewis · Satinder Singh -
2021 Oral: On the Expressivity of Markov Reward »
David Abel · Will Dabney · Anna Harutyunyan · Mark Ho · Michael Littman · Doina Precup · Satinder Singh -
2020 Poster: Discovering Reinforcement Learning Algorithms »
Junhyuk Oh · Matteo Hessel · Wojciech Czarnecki · Zhongwen Xu · Hado van Hasselt · Satinder Singh · David Silver -
2020 Poster: Meta-Gradient Reinforcement Learning with an Objective Discovered Online »
Zhongwen Xu · Hado van Hasselt · Matteo Hessel · Junhyuk Oh · Satinder Singh · David Silver -
2020 Poster: Learning to Play No-Press Diplomacy with Best Response Policy Iteration »
Thomas Anthony · Tom Eccles · Andrea Tacchetti · János Kramár · Ian Gemp · Thomas Hudson · Nicolas Porcel · Marc Lanctot · Julien Perolat · Richard Everett · Satinder Singh · Thore Graepel · Yoram Bachrach -
2020 Spotlight: Learning to Play No-Press Diplomacy with Best Response Policy Iteration »
Thomas Anthony · Tom Eccles · Andrea Tacchetti · János Kramár · Ian Gemp · Thomas Hudson · Nicolas Porcel · Marc Lanctot · Julien Perolat · Richard Everett · Satinder Singh · Thore Graepel · Yoram Bachrach -
2020 Poster: A Self-Tuning Actor-Critic Algorithm »
Tom Zahavy · Zhongwen Xu · Vivek Veeriah · Matteo Hessel · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh -
2020 Poster: On Efficiency in Hierarchical Reinforcement Learning »
Zheng Wen · Doina Precup · Morteza Ibrahimi · Andre Barreto · Benjamin Van Roy · Satinder Singh -
2020 Poster: The Value Equivalence Principle for Model-Based Reinforcement Learning »
Christopher Grimm · Andre Barreto · Satinder Singh · David Silver -
2020 Spotlight: On Efficiency in Hierarchical Reinforcement Learning »
Zheng Wen · Doina Precup · Morteza Ibrahimi · Andre Barreto · Benjamin Van Roy · Satinder Singh -
2019 Poster: Hindsight Credit Assignment »
Anna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos -
2019 Spotlight: Hindsight Credit Assignment »
Anna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos