Timezone: »
Offline Reinforcement Learning (RL) is a promising approach for learning optimal policies in environments where direct exploration is expensive or unfeasible. However, the adoption of such policies in practice is often challenging, as they are hard to interpret within the application context, and lack measures of uncertainty for the learned policy value and its decisions. To overcome these issues, we propose an Expert-Supervised RL (ESRL) framework which uses uncertainty quantification for offline policy learning. In particular, we have three contributions: 1) the method can learn safe and optimal policies through hypothesis testing, 2) ESRL allows for different levels of risk averse implementations tailored to the application context, and finally, 3) we propose a way to interpret ESRL’s policy at every state through posterior distributions, and use this framework to compute off-policy value function posteriors. We provide theoretical guarantees for our estimators and regret bounds consistent with Posterior Sampling for RL (PSRL). Sample efficiency of ESRL is independent of the chosen risk aversion threshold and quality of the behavior policy.
Author Information
Aaron Sonabend (Harvard University)
Junwei Lu
Leo Anthony Celi (Massachusetts Institute of Technology)
Tianxi Cai (Harvard School of Public Health)
Peter Szolovits (MIT)
More from the Same Authors
-
2021 : Chest ImaGenome Dataset for Clinical Reasoning »
Joy T Wu · Nkechinyere Agu · Ismini Lourentzou · Arjun Sharma · Joseph Alexander Paguio · Jasper Seth Yao · Edward C Dee · William Mitchell · Satyananda Kashyap · Andrea Giovannini · Leo Anthony Celi · Mehdi Moradi -
2022 : Structure-Inducing Pre-training »
Matthew McDermott · Brendan Yap · Peter Szolovits · Marinka Zitnik -
2022 Workshop: Gaze meets ML »
Ismini Lourentzou · Joy T Wu · Satyananda Kashyap · Alexandros Karargyris · Leo Anthony Celi · Ban Kawas · Sachin S Talathi -
2021 : Chest ImaGenome Dataset for Clinical Reasoning »
Joy T Wu · Nkechinyere Agu · Ismini Lourentzou · Arjun Sharma · Joseph Alexander Paguio · Jasper Seth Yao · Edward C Dee · William Mitchell · Satyananda Kashyap · Andrea Giovannini · Leo Anthony Celi · Mehdi Moradi -
2020 : Scalable Gaussian Process Regression Via Median Posterior Inference for Estimating Multi-Pollutant Mixture Health Effects - Aaron Sonabend »
Aaron Sonabend -
2018 Poster: Sketching Method for Large Scale Combinatorial Inference »
Wei Sun · Junwei Lu · Han Liu -
2017 : Poster spotlights »
Hiroshi Kuwajima · Masayuki Tanaka · Qingkai Liang · Matthieu Komorowski · Fanyu Que · Thalita F Drumond · Aniruddh Raghu · Leo Anthony Celi · Christina Göpfert · Andrew Ross · Sarah Tan · Rich Caruana · Yin Lou · Devinder Kumar · Graham Taylor · Forough Poursabzi-Sangdeh · Jennifer Wortman Vaughan · Hanna Wallach -
2016 : Opening Keynote by Leo Anthony Celi: Data-Driven Healthcare »
Leo Anthony Celi