Timezone: »
Partially-observable Markov decision processes (POMDPs) provide a powerful model for real-world sequential decision-making problems. In recent years, point- based value iteration methods have proven to be extremely effective techniques for finding (approximately) optimal dynamic programming solutions to POMDPs when an initial set of belief states is known. However, no point-based work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an infinite number of possible observations, there are only a finite number of observation partitionings that are relevant for optimal decision-making when a finite, fixed set of reachable belief states is known. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic pro- gramming solutions for continuous state MDPs can be generalized to continu- ous state POMDPs with discrete observations, and (2) we show how this solution can be further extended via recently developed symbolic methods to continuous state and observations to derive the minimal relevant observation partitioning for potentially correlated, multivariate observation spaces. We demonstrate proof-of- concept results on uni- and multi-variate state and observation steam plant control.
Author Information
Zahra Zamani (ANU and NICTA)
Scott Sanner (University of Toronto)
Pascal Poupart (University of Waterloo)
Kristian Kersting (University of Bonn and Fraunhofer IAIS)
More from the Same Authors
-
2022 : Graphs, Constraints, and Search for the Abstraction and Reasoning Corpus »
Yudong Xu · Elias Khalil · Scott Sanner -
2022 Poster: Learning to Follow Instructions in Text-Based Games »
Mathieu Tuli · Andrew Li · Pashootan Vaezipoor · Toryn Klassen · Scott Sanner · Sheila McIlraith -
2021 Poster: Risk-Aware Transfer in Reinforcement Learning using Successor Features »
Michael Gimelfarb · Andre Barreto · Scott Sanner · Chi-Guhn Lee -
2021 Poster: Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models »
Yi Sui · Ga Wu · Scott Sanner -
2014 Poster: Mind the Nuisance: Gaussian Process Classification using Privileged Noise »
Daniel Hernández-lobato · Viktoriia Sharmanska · Kristian Kersting · Christoph Lampert · Novi Quadrianto -
2012 Poster: Cost-Sensitive Exploration in Bayesian Reinforcement Learning »
Dongho Kim · Kee-Eung Kim · Pascal Poupart -
2011 Workshop: Choice Models and Preference Learning »
Jean-Marc Andreoli · Cedric Archambeau · Guillaume Bouchard · Shengbo Guo · Kristian Kersting · Scott Sanner · Martin Szummer · Paolo Viappiani · Onno Zoeter -
2011 Poster: Automated Refinement of Bayes Networks' Parameters based on Test Ordering Constraints »
Omar Z Khan · Pascal Poupart · John Agosta -
2010 Workshop: Machine Learning for Assistive Technologies »
Jesse Hoey · Pascal Poupart · Thomas Ploetz -
2010 Session: Spotlights Session 8 »
Pascal Poupart -
2010 Session: Oral Session 9 »
Pascal Poupart -
2010 Poster: Gaussian Process Preference Elicitation »
Edwin Bonilla · Shengbo Guo · Scott Sanner -
2009 Mini Symposium: Partially Observable Reinforcement Learning »
Marcus Hutter · Will Uther · Pascal Poupart -
2008 Workshop: Model Uncertainty and Risk in Reinforcement Learning »
Yaakov Engel · Mohammad Ghavamzadeh · Shie Mannor · Pascal Poupart -
2006 Poster: Automated Hierarchy Discovery for Planning in Partially Observable Domains »
Laurent Charlin · Pascal Poupart · Romy Shioda