Timezone: »
We study off-policy evaluation and learning from sequential data in a structured class of Markov decision processes that arise from repeated interactions with an exogenous sequence of arrivals with contexts, which generate unknown individual-level responses to agent actions that induce known transitions. This is a relevant model, for example, for dynamic personalized pricing and other operations management problems in the presence of potentially high-dimensional user types. The individual-level response is not causally affected by the state variable. In this setting, we adapt doubly-robust estimation in the single-timestep setting to the sequential setting so that a state-dependent policy can be learned even from a single timestep's worth of data. We introduce a \textit{marginal MDP} model and study an algorithm for off-policy learning, which can be viewed as fitted value iteration in the marginal MDP. We also provide structural results on when errors in the response model leads to the persistence, rather than attenuation, of error over time. In simulations, we show that the advantages of doubly-robust estimation in the single time-step setting, via unbiased and lower-variance estimation, can directly translate to improved out-of-sample policy performance. This structure-specific analysis sheds light on the underlying structure on a class of problems, operations research/management problems, often heralded as a real-world domain for offline RL, which are in fact qualitatively easier.
Author Information
Angela Zhou (Cornell University)
More from the Same Authors
-
2021 : It's COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks »
Michelle Bao · Angela Zhou · Samantha Zottola · Brian Brubach · Sarah Desmarais · Aaron Horowitz · Kristian Lum · Suresh Venkatasubramanian -
2021 : Stateful Offline Contextual Policy Evaluation and Learning »
Angela Zhou -
2021 Workshop: Machine Learning Meets Econometrics (MLECON) »
David Bruns-Smith · Arthur Gretton · Limor Gultchin · Niki Kilbertus · Krikamol Muandet · Evan Munro · Angela Zhou -
2021 : It's COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks »
Michelle Bao · Angela Zhou · Samantha Zottola · Brian Brubach · Sarah Desmarais · Aaron Horowitz · Kristian Lum · Suresh Venkatasubramanian -
2020 Workshop: Consequential Decisions in Dynamic Environments »
Niki Kilbertus · Angela Zhou · Ashia Wilson · John Miller · Lily Hu · Lydia T. Liu · Nathan Kallus · Shira Mitchell -
2020 : Spotlight Talk 4: Fairness, Welfare, and Equity in Personalized Pricing »
Nathan Kallus · Angela Zhou -
2020 Poster: Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning »
Nathan Kallus · Angela Zhou -
2019 : Coffee Break and Poster Session »
Rameswar Panda · Prasanna Sattigeri · Kush Varshney · Karthikeyan Natesan Ramamurthy · Harvineet Singh · Vishwali Mhasawade · Shalmali Joshi · Laleh Seyyed-Kalantari · Matthew McDermott · Gal Yona · James Atwood · Hansa Srinivasan · Yonatan Halpern · D. Sculley · Behrouz Babaki · Margarida Carvalho · Josie Williams · Narges Razavian · Haoran Zhang · Amy Lu · Irene Y Chen · Xiaojie Mao · Angela Zhou · Nathan Kallus -
2019 : Opening Remarks »
Thorsten Joachims · Nathan Kallus · Michele Santacatterina · Adith Swaminathan · David Sontag · Angela Zhou -
2019 Workshop: “Do the right thing”: machine learning and causal inference for improved decision making »
Michele Santacatterina · Thorsten Joachims · Nathan Kallus · Adith Swaminathan · David Sontag · Angela Zhou -
2019 Poster: The Fairness of Risk Scores Beyond Classification: Bipartite Ranking and the XAUC Metric »
Nathan Kallus · Angela Zhou -
2019 Poster: Assessing Disparate Impact of Personalized Interventions: Identifiability and Bounds »
Nathan Kallus · Angela Zhou -
2018 Poster: Confounding-Robust Policy Improvement »
Nathan Kallus · Angela Zhou