Spotlight Poster

Adversarial Counterfactual Environment Model Learning

Xiong-Hui Chen · Yang Yu · Zhengmao Zhu · ZhiHua Yu · Chen Zhenjun · Chenghe Wang · Yinan Wu · Rong-Jun Qin · Hongqiu Wu · Ruijin Ding · Huang Fangsheng

Great Hall & Hall B1+B2 (level 1) #1415
[ ]
Thu 14 Dec 8:45 a.m. PST — 10:45 a.m. PST

Abstract:

An accurate environment dynamics model is crucial for various downstream tasks, such as counterfactual prediction, off-policy evaluation, and offline reinforcement learning. Currently, these models were learned through empirical risk minimization (ERM) by step-wise fitting of historical transition data.However, we first show that, particularly in the sequential decision-making setting, this approach may catastrophically fail to predict counterfactual action effects due to the selection bias of behavior policies during data collection.To tackle this problem, we introduce a novel model-learning objective called adversarial weighted empirical risk minimization (AWRM). AWRM incorporates an adversarial policy that exploits the model to generate a data distribution that weakens the model's prediction accuracy, and subsequently, the model is learned under this adversarial data distribution.We implement a practical algorithm, GALILEO, for AWRM and evaluate it on two synthetic tasks, three continuous-control tasks, and \textit{a real-world application}. The experiments demonstrate that GALILEO can accurately predict counterfactual actions and improve various downstream tasks, including offline policy evaluation and improvement, as well as online decision-making.

Chat is not available.