Timezone: »
In this paper, we seek robust policies for uncertain Markov Decision Processes (MDPs). Most robust optimization approaches for these problems have focussed on the computation of {\em maximin} policies which maximize the value corresponding to the worst realization of the uncertainty. Recent work has proposed {\em minimax} regret as a suitable alternative to the {\em maximin} objective for robust optimization. However, existing algorithms for handling {\em minimax} regret are restricted to models with uncertainty over rewards only. We provide algorithms that employ sampling to improve across multiple dimensions: (a) Handle uncertainties over both transition and reward models; (b) Dependence of model uncertainties across state, action pairs and decision epochs; (c) Scalability and quality bounds. Finally, to demonstrate the empirical effectiveness of our sampling approaches, we provide comparisons against benchmark algorithms on two domains from literature. We also provide a Sample Average Approximation (SAA) analysis to compute a posteriori error bounds.
Author Information
Asrar Ahmed (Singapore Management University)
Pradeep Varakantham (Singapore Management University)
Yossiri Adulyasak (University of Montréal (HEC))
Patrick Jaillet (MIT)
More from the Same Authors
-
2023 Poster: Memory-Constrained Algorithms for Convex Optimization »
Moise Blanchard · Junhui Zhang · Patrick Jaillet -
2023 Poster: Quantum Bayesian Optimization »
Zhongxiang Dai · Gregory Kang Ruey Lau · Arun Verma · YAO SHU · Bryan Kian Hsiang Low · Patrick Jaillet -
2023 Poster: Batch Bayesian Optimization For Replicable Experimental Design »
Zhongxiang Dai · Quoc Phong Nguyen · Sebastian Tay · Daisuke Urano · Richalynn Leong · Bryan Kian Hsiang Low · Patrick Jaillet -
2023 Poster: Incentives in Private Collaborative Machine Learning »
Rachael Sim · Yehong Zhang · Nghia Hoang · Xinyi Xu · Bryan Kian Hsiang Low · Patrick Jaillet -
2023 Poster: Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement Learning »
Changyu CHEN · Ramesha Karunasena · Thanh H Nguyen · Arunesh Sinha · Pradeep Varakantham -
2022 Poster: Effective Dimension in Bandit Problems under Censorship »
Gauthier Guinet · Saurabh Amin · Patrick Jaillet -
2022 Poster: Trade-off between Payoff and Model Rewards in Shapley-Fair Collaborative Machine Learning »
Quoc Phong Nguyen · Bryan Kian Hsiang Low · Patrick Jaillet -
2022 Poster: Sample-Then-Optimize Batch Neural Thompson Sampling »
Zhongxiang Dai · YAO SHU · Bryan Kian Hsiang Low · Patrick Jaillet -
2021 Poster: Differentially Private Federated Bayesian Optimization with Distributed Exploration »
Zhongxiang Dai · Bryan Kian Hsiang Low · Patrick Jaillet -
2021 Poster: Optimizing Conditional Value-At-Risk of Black-Box Functions »
Quoc Phong Nguyen · Zhongxiang Dai · Bryan Kian Hsiang Low · Patrick Jaillet -
2020 Poster: Variational Bayesian Unlearning »
Quoc Phong Nguyen · Bryan Kian Hsiang Low · Patrick Jaillet -
2020 Poster: Federated Bayesian Optimization via Thompson Sampling »
Zhongxiang Dai · Bryan Kian Hsiang Low · Patrick Jaillet -
2020 Poster: No-regret Learning in Price Competitions under Consumer Reference Effects »
Negin Golrezaei · Patrick Jaillet · Jason Cheuk Nam Liang -
2019 Poster: Implicit Posterior Variational Inference for Deep Gaussian Processes »
Haibin YU · Yizhou Chen · Bryan Kian Hsiang Low · Patrick Jaillet · Zhongxiang Dai -
2019 Spotlight: Implicit Posterior Variational Inference for Deep Gaussian Processes »
Haibin YU · Yizhou Chen · Bryan Kian Hsiang Low · Patrick Jaillet · Zhongxiang Dai -
2017 : Aligned AI Poster Session »
Amanda Askell · Rafal Muszynski · William Wang · Yaodong Yang · Quoc Nguyen · Bryan Kian Hsiang Low · Patrick Jaillet · Candice Schumann · Anqi Liu · Peter Eckersley · Angelina Wang · William Saunders -
2017 Poster: Real-Time Bidding with Side Information »
arthur flajolet · Patrick Jaillet -
2017 Poster: Online Learning with a Hint »
Ofer Dekel · arthur flajolet · Nika Haghtalab · Patrick Jaillet -
2015 Poster: Inverse Reinforcement Learning with Locally Consistent Reward Functions »
Quoc Phong Nguyen · Bryan Kian Hsiang Low · Patrick Jaillet