Poster

Reference-Based POMDPs

Edward Kim · Yohan Karunanayake · Hanna Kurniawati

Great Hall & Hall B1+B2 (level 1) #1418
[ ]
Tue 12 Dec 3:15 p.m. PST — 5:15 p.m. PST

Abstract:

Making good decisions in partially observable and non-deterministic scenarios is a crucial capability for robots. A Partially Observable Markov Decision Process (POMDP) is a general framework for the above problem. Despite advances in POMDP solving, problems with long planning horizons and evolving environments remain difficult to solve even by the best approximate solvers today. To alleviate this difficulty, we propose a slightly modified POMDP problem, called a Reference-Based POMDP, where the POMDP objective function is slightly modified to balance between maximizing the expected total reward and being close to a given reference (stochastic) policy. The optimal policy of a Reference-Based POMDP can be computed via iterative expectations using the given reference policy, thereby avoiding exhaustive enumeration of actions at each belief node of the search tree. We demonstrate theoretically that the standard POMDP under stochastic policies is related to the Reference-Based POMDP under suitable conditions. To demonstrate the feasibility of exploiting the Reference-Based POMDP formulation, we present a basic algorithm RefSolver. Results from experiments on long-horizon navigation problems indicate that this basic algorithm substantially outperforms POMCP.

Chat is not available.