Demystifying Emergent Exploration in Goal-conditioned RL
Abstract
In this work, we take a first step toward uncovering the underlying dynamics of emergent exploration in unsupervised reinforcement learning. We study Single-Goal Contrastive RL (SGCRL), which is capable of solving challenging robotic manipulation tasks without external rewards or curricula. Drawing on methods from cognitive science, we combine theoretical analysis of the algorithm's objective function with controlled experiments to improve understanding of its behavioral drivers. We show that SGCRL implicitly maximizes rewards shaped by its learned representations. The contrastive representations adapt the reward landscape to promote exploration prior to reaching the goal and exploitation thereafter. We also build a simple model of the algorithm without function approximation, isolating the essential components responsible for its exploratory behavior. Finally, we establish connections between SGCRL's exploration dynamics and classical exploration methods, including R-MAX and PSRL.