Timezone: »

 
Understanding the Effects of Dataset Composition on Offline Reinforcement Learning
Kajetan Schweighofer · Markus Hofmarcher · Marius-Constantin Dinu · Philipp Renz · Angela Bitto · Vihang Patil · Sepp Hochreiter
Event URL: https://openreview.net/forum?id=A4EWtf-TO3Y »

The promise of Offline Reinforcement Learning (RL) lies in learning policies from fixed datasets, without interacting with the environment. Being unable to interact makes the dataset one of the most essential ingredient of the algorithm and has a large influence on the performance of the learned policy. Studies on how the dataset composition influences various Offline RL algorithms are missing currently. Towards that end, we conducted a comprehensive empirical analysis on the effect of dataset composition towards the performance of Offline RL algorithms for discrete action environments. The performance is studied through two metrics of the datasets, Trajectory Quality (TQ) and State-Action Coverage (SACo). Our analysis suggests that variants of the off-policy Deep-Q-Network family rely on the dataset to exhibit high SACo. Contrary to that, algorithms that constrain the learned policy towards the data generating policy perform well across datasets, if they exhibit high TQ or SACo or both. For datasets with high TQ, Behavior Cloning outperforms or performs similarly to the best Offline RL algorithms.

Author Information

Kajetan Schweighofer (Johannes Kepler University Linz)
Markus Hofmarcher (ELLIS Unit / University Linz)
Marius-Constantin Dinu (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Dynatrace Research)
Philipp Renz (LIT AI Lab - JKU Linz)
Angela Bitto (JKU)
Vihang Patil (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria)
Sepp Hochreiter (LIT AI Lab / University Linz)

Head of the LIT AI Lab and Professor of bioinformatics at the University of Linz. First to identify and analyze the vanishing gradient problem, the fundamental deep learning problem, in 1991. First author of the main paper on the now widely used LSTM RNNs. He implemented 'learning how to learn' (meta-learning) networks via LSTM RNNs and applied Deep Learning and RNNs to self-driving cars, sentiment analysis, reinforcement learning, bioinformatics, and medicine.

More from the Same Authors