Timezone: »
We consider the problem of using expert data with unobserved confounders for imitation and reinforcement learning. We begin by defining the problem of learning from confounded expert data in a contextual MDP setup. We analyze the limitations of learning from such data with and without external reward and propose an adjustment of standard imitation learning algorithms to fit this setup. In addition, we discuss the problem of distribution shift between the expert data and the online environment when partial observability is present in the data. We prove possibility and impossibility results for imitation learning under arbitrary distribution shift of the missing covariates. When additional external reward is provided, we propose a sampling procedure that addresses the unknown shift and prove convergence to an optimal solution. Finally, we validate our claims empirically on challenging assistive healthcare and recommender system simulation tasks.
Author Information
Guy Tennenholtz (Technion, Technion)
Assaf Hallak (The Technion)
Gal Dalal (NVIDIA)
Shie Mannor (Technion)
Gal Chechik (Nvidia)
Uri Shalit (Technion)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 : Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning »
Dates n/a. Room None
More from the Same Authors
-
2021 : Bandits with Partially Observable Confounded Data »
Guy Tennenholtz · Uri Shalit · Shie Mannor · Yonathan Efroni -
2021 : Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning »
Guy Tennenholtz · Assaf Hallak · Gal Dalal · Shie Mannor · Gal Chechik · Uri Shalit -
2021 : Latent Geodesics of Model Dynamics for Offline Reinforcement Learning »
Guy Tennenholtz · Nir Baram · Shie Mannor -
2021 : Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning »
Roy Zohar · Shie Mannor · Guy Tennenholtz -
2021 : Latent Geodesics of Model Dynamics for Offline Reinforcement Learning »
Guy Tennenholtz · Nir Baram · Shie Mannor -
2021 : Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning (Guy Tennenholtz) »
Guy Tennenholtz -
2021 : Uri Shalit - Calibration, out-of-distribution generalization and a path towards causal representations »
Uri Shalit -
2021 Poster: Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction »
Gal Dalal · Assaf Hallak · Steven Dalton · iuri frosio · Shie Mannor · Gal Chechik -
2020 : Mini-panel discussion 2 - Real World RL: An industry perspective »
Franziska Meier · Gabriel Dulac-Arnold · Shie Mannor · Timothy A Mann -
2020 Workshop: The Challenges of Real World Reinforcement Learning »
Daniel Mankowitz · Gabriel Dulac-Arnold · Shie Mannor · Omer Gottesman · Anusha Nagabandi · Doina Precup · Timothy A Mann · Gabriel Dulac-Arnold -
2020 Poster: Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models »
Andrew Jesson · Sören Mindermann · Uri Shalit · Yarin Gal -
2020 Poster: A causal view of compositional zero-shot recognition »
Yuval Atzmon · Felix Kreuk · Uri Shalit · Gal Chechik -
2020 Spotlight: A causal view of compositional zero-shot recognition »
Yuval Atzmon · Felix Kreuk · Uri Shalit · Gal Chechik -
2020 Poster: Online Planning with Lookahead Policies »
Yonathan Efroni · Mohammad Ghavamzadeh · Shie Mannor -
2019 Poster: Distributional Policy Optimization: An Alternative Approach for Continuous Control »
Chen Tessler · Guy Tennenholtz · Shie Mannor -
2019 Poster: Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning »
Chao Qu · Shie Mannor · Huan Xu · Yuan Qi · Le Song · Junwu Xiong -
2018 : Discussion Panel: Ryan Adams, Nicolas Heess, Leslie Kaelbling, Shie Mannor, Emo Todorov (moderator: Roy Fox) »
Ryan Adams · Nicolas Heess · Leslie Kaelbling · Shie Mannor · Emo Todorov · Roy Fox -
2018 : Hierarchical RL: From Prior Knowledge to Policies (Shie Mannor) »
Shie Mannor -
2018 Poster: Removing Hidden Confounding by Experimental Grounding »
Nathan Kallus · Aahlad Puli · Uri Shalit -
2018 Spotlight: Removing Hidden Confounding by Experimental Grounding »
Nathan Kallus · Aahlad Puli · Uri Shalit -
2018 Poster: Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning »
Tom Zahavy · Matan Haroush · Nadav Merlis · Daniel J Mankowitz · Shie Mannor -
2017 Workshop: Hierarchical Reinforcement Learning »
Andrew G Barto · Doina Precup · Shie Mannor · Tom Schaul · Roy Fox · Carlos Florensa