Workshop: Offline Reinforcement Learning

Example-Based Offline Reinforcement Learning without Rewards

Kyle Hatch · Tianhe Yu · Rafael Rafailov · Chelsea Finn


Offline reinforcement learning (RL) methods, which tackle the problem of learning a policy from a static dataset, have shown promise in deploying RL in real-world scenarios. Offline RL allows the re-use and accumulation of large datasets while mitigating safety concerns that arise in online exploration. However, prior works require human-defined reward labels to learn from offline datasets. Reward specification remains a major challenge for deep RL algorithms and also poses an issue for offline RL in the real world since designing reward functions could take considerable manual effort and also potentially requires installing extra hardware such as visual sensors on robots to detect the completion of a task. In contrast, in many settings, it is easier for users to provide examples of a completed task such as images than specifying a complex reward function. Based on this observation, we propose an algorithm that can learn behaviors from offline datasets without reward labels, instead of using a small number of example images. Our method learns a conservative classifier that directly learns a Q-function from the offline dataset and the successful examples while penalizing the Q-values to prevent distributional shift. Through extensive empirical results, we find that our method outperforms prior imitation learning algorithms and inverse RL methods by 53% that directly learn rewards in vision-based robot manipulation domains

Chat is not available.