Timezone: »
In real-world sequential decision problems, exploration is expensive, and the risk1of expert decision policies must be evaluated from limited data. In this setting, Monte Carlo (MC) risk estimators are typically used to estimate the risks associated with decision policies. While these estimators have the desired low bias property, they often suffer from large variance. In this paper, we consider the problem of minimizing the asymptotic mean squared error and hence variance of MC risk estimators. We show that by carefully choosing the data sampling policy (behavior policy), we can obtain low variance estimates of the risk of any given decision policy.
Author Information
Elita Lobo (University of Massachusetts Amherst)
I am a second year Ms-PhD student currently working with Professor Marek Petrik in the field of Robust RL. Previously I worked with Professor Rod Grupen on developing a Hierarchical Reinforcement Learning framework for generating diverse skills and with Professor Prashant Shenoy on peak energy demand day prediction for energy grid. Prior to pursuing research in UMass, I worked as a software engineer in one of the the top e-commerce company - Flipkart in India. I am strongly interested in pursuing my research in the field of Reinforcement Learning and Optimizations.
Marek Petrik (University of New Hampshire)
Dharmashankar Subramanian (IBM Research)
More from the Same Authors
-
2021 : Unbiased Efficient Feature Counts for Inverse RL »
Gerard Donahue · Brendan Crowe · Marek Petrik · Daniel Brown -
2023 Poster: Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor »
Julien Grand-Clément · Marek Petrik -
2023 Poster: Percentile Criterion Optimization in Offline Reinforcement Learning »
Cyrus Cousins · Elita Lobo · Marek Petrik · Yair Zick -
2023 Poster: On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes »
Jia Lin Hau · Erick Delage · Mohammad Ghavamzadeh · Marek Petrik -
2023 Poster: Pairwise Causality Guided Transformers for Event Sequences »
Xiao Shou · Debarun Bhattacharjya · Tian Gao · Dharmashankar Subramanian · Oktie Hassanzadeh · Kristin P Bennett -
2022 Poster: Robust $\phi$-Divergence MDPs »
Chin Pang Ho · Marek Petrik · Wolfram Wiesemann -
2021 : Safe RL Panel Discussion »
Animesh Garg · Marek Petrik · Shie Mannor · Claire Tomlin · Ugo Rosolia · Dylan Hadfield-Menell -
2021 Workshop: Safe and Robust Control of Uncertain Systems »
Ashwin Balakrishna · Brijen Thananjeyan · Daniel Brown · Marek Petrik · Melanie Zeilinger · Sylvia Herbert -
2021 Poster: Fast Algorithms for $L_\infty$-constrained S-rectangular Robust MDPs »
Bahram Behzadian · Marek Petrik · Chin Pang Ho -
2021 Poster: Causal Inference for Event Pairs in Multivariate Point Processes »
Tian Gao · Dharmashankar Subramanian · Debarun Bhattacharjya · Xiao Shou · Nicholas Mattei · Kristin P Bennett -
2020 Poster: Bayesian Robust Optimization for Imitation Learning »
Daniel S. Brown · Scott Niekum · Marek Petrik -
2019 Workshop: Safety and Robustness in Decision-making »
Mohammad Ghavamzadeh · Shie Mannor · Yisong Yue · Marek Petrik · Yinlam Chow -
2019 Poster: Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs »
Marek Petrik · Reazul Hasan Russel -
2018 Poster: Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes »
Andrea Tirinzoni · Marek Petrik · Xiangli Chen · Brian Ziebart -
2018 Spotlight: Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes »
Andrea Tirinzoni · Marek Petrik · Xiangli Chen · Brian Ziebart -
2018 Poster: Proximal Graphical Event Models »
Debarun Bhattacharjya · Dharmashankar Subramanian · Tian Gao -
2018 Spotlight: Proximal Graphical Event Models »
Debarun Bhattacharjya · Dharmashankar Subramanian · Tian Gao -
2016 Poster: Safe Policy Improvement by Minimizing Robust Baseline Regret »
Mohammad Ghavamzadeh · Marek Petrik · Yinlam Chow -
2014 Workshop: From Bad Models to Good Policies (Sequential Decision Making under Uncertainty) »
Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor · Jeremie Mary · Laurent Orseau · Thomas Dietterich · Ronald Ortner · Peter Grünwald · Joelle Pineau · Raphael Fonteneau · Georgios Theocharous · Esteban D Arcaute · Christos Dimitrakakis · Nan Jiang · Doina Precup · Pierre-Luc Bacon · Marek Petrik · Aviv Tamar -
2014 Poster: RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning »
Marek Petrik · Dharmashankar Subramanian -
2014 Spotlight: RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning »
Marek Petrik · Dharmashankar Subramanian