Timezone: »

Off-Policy Evaluation with Policy-Dependent Optimization Response
Wenshuo Guo · Michael Jordan · Angela Zhou

Thu Dec 01 02:00 PM -- 04:00 PM (PST) @ Hall J #724

The intersection of causal inference and machine learning for decision-making is rapidly expanding, but the default decision criterion remains an average of individual causal outcomes across a population. In practice, various operational restrictions ensure that a decision-maker's utility is not realized as an average but rather as an output of a downstream decision-making problem (such as matching, assignment, network flow, minimizing predictive risk). In this work, we develop a new framework for off-policy evaluation with policy-dependent linear optimization responses: causal outcomes introduce stochasticity in objective function coefficients. Under this framework, a decision-maker's utility depends on the policy-dependent optimization, which introduces a fundamental challenge of optimization bias even for the case of policy evaluation. We construct unbiased estimators for the policy-dependent estimand by a perturbation method, and discuss asymptotic variance properties for a set of adjusted plug-in estimators. Lastly, attaining unbiased policy evaluation allows for policy optimization: we provide a general algorithm for optimizing causal interventions. We corroborate our theoretical results with numerical simulations.

Author Information

Wenshuo Guo (UC Berkeley)
Michael Jordan (UC Berkeley)
Angela Zhou (University of Southern California)

More from the Same Authors