Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information
Genevieve Flaspohler, Nicholas Roy, John Fisher III
Poster Session 2 (more posters)
on 2020-12-08T09:00:00-08:00 - 2020-12-08T11:00:00-08:00
GatherTown: Reinforcement learning and planning ( Town D2 - Spot B3 )
on 2020-12-08T09:00:00-08:00 - 2020-12-08T11:00:00-08:00
GatherTown: Reinforcement learning and planning ( Town D2 - Spot B3 )
Join GatherTown
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.
Only iff poster is crowded, join Zoom . Authors have to start the Zoom call from their Profile page / Presentation History.
Toggle Abstract Paper (in Proceedings / .pdf)
Abstract: This work introduces macro-action discovery using value-of-information (VoI) for robust and efficient planning in partially observable Markov decision processes (POMDPs). POMDPs are a powerful framework for planning under uncertainty. Previous approaches have used high-level macro-actions within POMDP policies to reduce planning complexity. However, macro-action design is often heuristic and rarely comes with performance guarantees. Here, we present a method for extracting belief-dependent, variable-length macro-actions directly from a low-level POMDP model. We construct macro-actions by chaining sequences of open-loop actions together when the task-specific value of information (VoI) --- the change in expected task performance caused by observations in the current planning iteration --- is low. Importantly, we provide performance guarantees on the resulting VoI macro-action policies in the form of bounded regret relative to the optimal policy. In simulated tracking experiments, we achieve higher reward than both closed-loop and hand-coded macro-action baselines, selectively using VoI macro-actions to reduce planning complexity while maintaining near-optimal task performance.