Skip to yearly menu bar Skip to main content

Workshop: 3rd Offline Reinforcement Learning Workshop: Offline RL as a "Launchpad"

Using Confounded Data in Offline RL

Maxime Gasse · Damien GRASSET · Guillaume Gaudron · Pierre-Yves Oudeyer

Abstract: In this work we consider the problem of confounding in offline RL, also called the delusion problem. While it is known that learning from purely offline data is a hazardous endeavor in the presence of confounding, in this paper we show that offline, confounded data can be safely combined with online, non-confounded data to improve the sample-efficiency of model-based RL. We import ideas from the well-established framework of $do$-calculus to express model-based RL as a causal inference problem, thus bridging the fields of RL and causality. We propose a latent-based method which we prove is correct and efficient, in the sense that it attains better generalization guarantees thanks to the offline, confounded data (in the asymptotic case), regardless of the expert's behavior. We illustrate the effectiveness of our method on a series of synthetic experiments.

Chat is not available.