`

Timezone: »

 
Reducing the Information Horizon of Bayes-Adaptive Markov Decision Processes via Epistemic State Abstraction
Dilip Arumugam · Satinder Singh

The Bayes-Adaptive Markov Decision Process (BAMDP) formalism pursues the Bayes-optimal solution to the exploration-exploitation trade-off in reinforcement learning. As the computation of exact solutions to Bayesian reinforcement-learning problems is intractable, much of the literature has focused on developing suitable approximation algorithms. In this work, before diving into algorithm design, we first define, under mild structural assumptions, a complexity measure for BAMDPs. As efficient exploration in BAMDPs hinges upon the judicious acquisition of information, our complexity measure highlights the worst-case difficulty of gathering information and exhausting epistemic uncertainty. To illustrate its significance, we establish a computationally-intractable, exact planning algorithm that takes advantage of this measure to show improved planning complexity. We then conclude by introducing a specific form of state abstraction that reduces BAMDP complexity and gives rise to a computationally-tractable, approximate planning algorithm.

Author Information

Dilip Arumugam (Stanford University)
Satinder Singh (DeepMind)

More from the Same Authors