Skip to yearly menu bar Skip to main content

Workshop: Goal-Conditioned Reinforcement Learning

An Investigation into Value-Implicit Pre-training for Task-Agnostic, Sample-Efficient Reinforcement Learning

Samyeul Noh · Seonghyun Kim · Ingook Jang

Keywords: [ goal-conditioned; reinforcement learning; robotic manipulation; value-implicit pre-training; ]


One of the primary challenges of learning a diverse set of robotic manipulation skills from raw sensory observations is to learn a universal reward function that can be used for unseen tasks. To address this challenge, a recent breakthrough called value-implicit pre-training (VIP) has been proposed. VIP provides a self-supervised pre-trained visual representation that exhibits the capability to generate dense and smooth reward functions for unseen robotic tasks. In this paper, we explore the feasibility of VIP's goal-conditioned reward specification with the goal of achieving task-agnostic, sample-efficient reinforcement learning (RL). Our investigation involves an evaluation of online RL by means of VIP-generated rewards instead of human-crafted reward signals on goal-image-specified robotic manipulation tasks from Meta-World under a highly limited interaction. We find that VIP's goal-conditioned reward specification, including task-agnostic inherent features, can accelerate online RL when used in conjunction with sparse task-completion rewards after policy pre-training on a handful of demonstrations via behavior cloning, rather than when used alone.

Chat is not available.