Skip to yearly menu bar Skip to main content


Oral
in
Workshop: Ecological Theory of Reinforcement Learning: How Does Task Design Influence Agent Learning?

Understanding the Effects of Dataset Composition on Offline Reinforcement Learning

Kajetan Schweighofer · Markus Hofmarcher · Marius-Constantin Dinu · Angela Bitto · Philipp Renz · Vihang Patil · Sepp Hochreiter


Abstract:

The promise of Offline Reinforcement Learning (RL) lies in learning policies from fixed datasets, without interacting with the environment. Being unable to interact makes the dataset the most essential ingredient of the algorithm, as it directly affects the learned policies. Studies on how the dataset composition influences various Offline RL algorithms are missing. Towards that end, we conducted a comprehensive empirical analysis on the effect of dataset composition towards the performance of Offline RL algorithms for discrete action environments. The performance is studied through two metrics of the datasets, Trajectory Quality (TQ) and State-Action Coverage (SACo). Our analysis suggests that variants of the off-policy Deep-Q-Network family rely on the dataset to exhibit high SACo. Contrary to that, algorithms that constrain the learned policy towards the data generating policy perform well across datasets, if they exhibit high TQ or SACo or both. For datasets with high TQ, Behavior Cloning outperforms or performs similarly to the best Offline RL algorithms.