Skip to yearly menu bar Skip to main content

Workshop: Deep Reinforcement Learning

Understanding the Effects of Dataset Composition on Offline Reinforcement Learning

Kajetan Schweighofer · Markus Hofmarcher · Marius-Constantin Dinu · Philipp Renz · Angela Bitto · Vihang Patil · Sepp Hochreiter


The promise of Offline Reinforcement Learning (RL) lies in learning policies from fixed datasets, without interacting with the environment. Being unable to interact makes the dataset one of the most essential ingredient of the algorithm and has a large influence on the performance of the learned policy. Studies on how the dataset composition influences various Offline RL algorithms are missing currently. Towards that end, we conducted a comprehensive empirical analysis on the effect of dataset composition towards the performance of Offline RL algorithms for discrete action environments. The performance is studied through two metrics of the datasets, Trajectory Quality (TQ) and State-Action Coverage (SACo). Our analysis suggests that variants of the off-policy Deep-Q-Network family rely on the dataset to exhibit high SACo. Contrary to that, algorithms that constrain the learned policy towards the data generating policy perform well across datasets, if they exhibit high TQ or SACo or both. For datasets with high TQ, Behavior Cloning outperforms or performs similarly to the best Offline RL algorithms.

Chat is not available.