Timezone: »

Using Confounded Data in Offline RL
Maxime Gasse · Damien GRASSET · Guillaume Gaudron · Pierre-Yves Oudeyer

In this work we consider the problem of confounding in offline RL, also called the delusion problem. While it is known that learning from purely offline data is a hazardous endeavor in the presence of confounding, in this paper we show that offline, confounded data can be safely combined with online, non-confounded data to improve the sample-efficiency of model-based RL. We import ideas from the well-established framework of $do$-calculus to express model-based RL as a causal inference problem, thus bridging the fields of RL and causality. We propose a latent-based method which we prove is correct and efficient, in the sense that it attains better generalization guarantees thanks to the offline, confounded data (in the asymptotic case), regardless of the expert's behavior. We illustrate the effectiveness of our method on a series of synthetic experiments.

#### Author Information

##### Maxime Gasse (Polytechnique Montréal)

I am a machine learning researcher within the Data Science for Real-Time Decision Making Canada Excellence Research Chair (CERC), and also part of the MILA research institute on artificial intelligence in Montréal, Canada. The question that motivates my research is: can machines think? My broad research interests include: - probabilistic graphical models and their theoretical properties (my PhD Thesis) - structured prediction, in particular multi-label classification - combinatorial optimization using machine learning (see our Ecole library) - causality, specifically in the context of reinforcement learning