Timezone: »

Off-Policy Evaluation for Action-Dependent Non-stationary Environments
Yash Chandak · Shiv Shankar · Nathaniel Bastian · Bruno da Silva · Emma Brunskill · Philip Thomas

Wed Nov 30 09:00 AM -- 11:00 AM (PST) @ Hall J #327

Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary. This limits the application of such methods because real-world problems are often subject to changes due to external factors (\textit{passive} non-stationarity), changes induced by interactions with the system itself (\textit{active} non-stationarity), or both (\textit{hybrid} non-stationarity). In this work, we take the first steps towards the fundamental challenge of on-policy and off-policy evaluation amidst structured changes due to active, passive, or hybrid non-stationarity. Towards this goal, we make a \textit{higher-order stationarity} assumption such that non-stationarity results in changes over time, but the way changes happen is fixed. We propose, OPEN, an algorithm that uses a double application of counterfactual reasoning and a novel importance-weighted instrument-variable regression to obtain both a lower bias and a lower variance estimate of the structure in the changes of a policy's past performances. Finally, we show promising results on how OPEN can be used to predict future performances for several domains inspired by real-world applications that exhibit non-stationarity.

Author Information

Yash Chandak (Stanford University)
Shiv Shankar (IIT Bombay)
Nathaniel Bastian (United States Military Academy)
Nathaniel Bastian

A science, technology, engineering and mathematics (STEM) leader, researcher, and educator with specialization in algorithms, techniques, tools and technologies from operations research, data science, artificial intelligence, systems engineering, and applied economics to research, design, develop, and deploy solutions to enable the improvement and enhancement of decision-making in the national security (military, cyber and defense) domain.

Bruno da Silva (University of Massachusetts)
Emma Brunskill (Stanford University)
Philip Thomas (University of Massachusetts Amherst)

More from the Same Authors