Timezone: »

Foundation Models for Semantic Novelty in Reinforcement Learning
Tarun Gupta · Peter Karkus · Tong Che · Danfei Xu · Marco Pavone
Event URL: https://openreview.net/forum?id=ryn1BfL4Rf »

Effectively exploring the environment is a key challenge in reinforcement learning (RL). We address this challenge by defining a novel intrinsic reward based on a foundation model, such as contrastive language image pretraining (CLIP), which can encode a wealth of domain-independent semantic visual-language knowledge about the world. Specifically, our intrinsic reward is defined based on pre-trained CLIP embeddings without any fine-tuning or learning on the target RL task. We demonstrate that CLIP-based intrinsic rewards can drive exploration towards semantically meaningful states and outperform state-of-the-art methods in challenging sparse-reward procedurally-generated environments.

Author Information

Tarun Gupta (University of Oxford)
Peter Karkus (NVIDIA Research)

I am a researcher in machine learning and robotics with a long-term vision of building human-level robot intelligence. My research focuses on autonomous vehicles and embodied AI that combines learning with structure and reasoning. My recent works are on neural networks that encode classic robot algorithms in order to learn partially observable planning, visual navigation, localization, and mapping tasks.

Tong Che (MILA, Montreal)
Danfei Xu (Georgia Institute of Technology)
Marco Pavone (Stanford University)

More from the Same Authors