Timezone: »

Regret based Robust Solutions for Uncertain Markov Decision Processes
Asrar Ahmed · Pradeep Varakantham · Yossiri Adulyasak · Patrick Jaillet

Fri Dec 06 07:00 PM -- 11:59 PM (PST) @ Harrah's Special Events Center, 2nd Floor

In this paper, we seek robust policies for uncertain Markov Decision Processes (MDPs). Most robust optimization approaches for these problems have focussed on the computation of {\em maximin} policies which maximize the value corresponding to the worst realization of the uncertainty. Recent work has proposed {\em minimax} regret as a suitable alternative to the {\em maximin} objective for robust optimization. However, existing algorithms for handling {\em minimax} regret are restricted to models with uncertainty over rewards only. We provide algorithms that employ sampling to improve across multiple dimensions: (a) Handle uncertainties over both transition and reward models; (b) Dependence of model uncertainties across state, action pairs and decision epochs; (c) Scalability and quality bounds. Finally, to demonstrate the empirical effectiveness of our sampling approaches, we provide comparisons against benchmark algorithms on two domains from literature. We also provide a Sample Average Approximation (SAA) analysis to compute a posteriori error bounds.

Author Information

Asrar Ahmed (Singapore Management University)
Pradeep Varakantham (Singapore Management University)
Yossiri Adulyasak (University of Montréal (HEC))
Patrick Jaillet (MIT)

More from the Same Authors