Timezone: »
Offline reinforcement learning, wherein one uses off-policy data logged by a fixed behavior policy to evaluate and learn new policies, is crucial in applications where experimentation is limited such as medicine. We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous. Targeting deterministic policies, for which action is a deterministic function of state, is crucial since optimal policies are always deterministic (up to ties). In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist. To circumvent this issue, we propose several new doubly robust estimators based on different kernelization approaches. We analyze the asymptotic mean-squared error of each of these under mild rate conditions for nuisance estimators. Specifically, we demonstrate how to obtain a rate that is independent of the horizon length.
Author Information
Nathan Kallus (Cornell University)
Masatoshi Uehara (Cornell University)
More from the Same Authors
-
2020 Workshop: Consequential Decisions in Dynamic Environments »
Niki Kilbertus · Angela Zhou · Ashia Wilson · John Miller · Lily Hu · Lydia T. Liu · Nathan Kallus · Shira Mitchell -
2020 Poster: Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning »
Nathan Kallus · Angela Zhou -
2020 Poster: Off-Policy Evaluation and Learning for External Validity under a Covariate Shift »
Masatoshi Uehara · Masahiro Kato · Shota Yasui -
2020 Spotlight: Off-Policy Evaluation and Learning for External Validity under a Covariate Shift »
Masatoshi Uehara · Masahiro Kato · Shota Yasui -
2019 Workshop: “Do the right thing”: machine learning and causal inference for improved decision making »
Michele Santacatterina · Thorsten Joachims · Nathan Kallus · Adith Swaminathan · David Sontag · Angela Zhou -
2019 Poster: The Fairness of Risk Scores Beyond Classification: Bipartite Ranking and the XAUC Metric »
Nathan Kallus · Angela Zhou -
2019 Poster: Assessing Disparate Impact of Personalized Interventions: Identifiability and Bounds »
Nathan Kallus · Angela Zhou -
2019 Poster: Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning »
Nathan Kallus · Masatoshi Uehara -
2019 Poster: Policy Evaluation with Latent Confounders via Optimal Balance »
Andrew Bennett · Nathan Kallus -
2019 Poster: Deep Generalized Method of Moments for Instrumental Variable Analysis »
Andrew Bennett · Nathan Kallus · Tobias Schnabel -
2018 Workshop: Challenges and Opportunities for AI in Financial Services: the Impact of Fairness, Explainability, Accuracy, and Privacy »
Manuela Veloso · Nathan Kallus · Sameena Shah · Senthil Kumar · Isabelle Moulinier · Jiahao Chen · John Paisley -
2018 Poster: Causal Inference with Noisy and Missing Covariates via Matrix Factorization »
Nathan Kallus · Xiaojie Mao · Madeleine Udell -
2018 Poster: Removing Hidden Confounding by Experimental Grounding »
Nathan Kallus · Aahlad Puli · Uri Shalit -
2018 Spotlight: Removing Hidden Confounding by Experimental Grounding »
Nathan Kallus · Aahlad Puli · Uri Shalit -
2018 Poster: Confounding-Robust Policy Improvement »
Nathan Kallus · Angela Zhou -
2018 Poster: Balanced Policy Evaluation and Learning »
Nathan Kallus -
2017 Workshop: From 'What If?' To 'What Next?' : Causal Inference and Machine Learning for Intelligent Decision Making »
Ricardo Silva · Panagiotis Toulis · John Shawe-Taylor · Alexander Volfovsky · Thorsten Joachims · Lihong Li · Nathan Kallus · Adith Swaminathan