Skip to yearly menu bar Skip to main content


Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction

Dilip Arumugam · Satinder Singh

Hall J (level 1) #330

Keywords: [ Exploration ] [ Planning ] [ Bayesian reinforcement learning ] [ Bayes-Adaptive Markov Decision Process ]


The Bayes-Adaptive Markov Decision Process (BAMDP) formalism pursues the Bayes-optimal solution to the exploration-exploitation trade-off in reinforcement learning. As the computation of exact solutions to Bayesian reinforcement-learning problems is intractable, much of the literature has focused on developing suitable approximation algorithms. In this work, before diving into algorithm design, we first define, under mild structural assumptions, a complexity measure for BAMDP planning. As efficient exploration in BAMDPs hinges upon the judicious acquisition of information, our complexity measure highlights the worst-case difficulty of gathering information and exhausting epistemic uncertainty. To illustrate its significance, we establish a computationally-intractable, exact planning algorithm that takes advantage of this measure to show more efficient planning. We then conclude by introducing a specific form of state abstraction with the potential to reduce BAMDP complexity and gives rise to a computationally-tractable, approximate planning algorithm.

Chat is not available.