Skip to yearly menu bar Skip to main content


Poster

Pre-Trained Multi-Goal Transformers with Prompt Optimization for Efficient Online Adaptation

Haoqi Yuan · Yuhui Fu · Feiyang Xie · Zongqing Lu

[ ]
Thu 12 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

Efficiently solving unseen tasks remains a challenge in reinforcement learning (RL), especially for long-horizon tasks composed of multiple subtasks. Pre-training policies from task-agnostic datasets has emerged as a promising approach, yet existing methods still necessitate substantial interactions via RL to learn new tasks.We introduce MGPO, a method that leverages the power of Transformer-based policies to model sequences of goals, enabling efficient online adaptation through prompt optimization.In its pre-training phase, MGPO utilizes hindsight multi-goal relabeling and behavior cloning. This combination equips the policy to model diverse long-horizon behaviors that align with varying goal sequences.During online adaptation, the goal sequence, conceptualized as a prompt, is optimized to improve task performance. We adopt a multi-armed bandit framework for this process, enhancing prompt selection based on the returns from online trajectories.Our experiments across various environments demonstrate that MGPO holds substantial advantages in sample efficiency, online adaptation performance, robustness, and interpretability compared with existing methods.

Live content is unavailable. Log in and register to view live content