Timezone: »

 
Contextual Transformer for Offline Meta Reinforcement Learning
Runji Lin · Ye Li · Xidong Feng · Zhaowei Zhang · XIAN HONG WU FUNG · Haifeng Zhang · Jun Wang · Yali Du · Yaodong Yang
Event URL: https://openreview.net/forum?id=FKzaFAj8mV8 »

Recently, the pretrain-tuning paradigm in large-scale sequence models has made significant progress in Natural Language Processing and Computer Vision. However, such a paradigm is still hindered by intractable challenges in Reinforcement Learning (RL), including the lack of self-supervised large-scale pretraining methods based on offline data and efficient fine-tuning/prompt-tuning over unseen downstream tasks. In this work, we explore how prompts can help sequence-modeling-based offline Reinforcement Learning (offline-RL) algorithms. Firstly, we propose prompt tuning for offline RL, where a context vector sequence is concatenated with the input to guide the conditional generation. As such, we can pretrain a model on the offline dataset with supervised loss and learn a prompt to guide the policy to play the desired actions. Secondly, we extend the framework to the Meta-RL setting and propose Contextual Meta Transformer (CMT), which leverages the context among different tasks as the prompt to improve the performance on unseen tasks. We conduct extensive experiments across three different offline-RL settings: offline single-agent RL on the D4RL dataset, offline Meta-RL on the MuJoCo benchmark, and offline MARL on the SMAC benchmark; the results validate the strong performance, high computation efficiency, and generality of our methods.

Author Information

Runji Lin (Institute of automation, Chinese Academy of Sciences)
Runji Lin

Runji Lin is a master's student at Institute of Automation, Chinese Academy of Sciences (CASIA). His research areas include reinforcement learning, game theory, and multi-agent system.

Ye Li (Nankai University)
Xidong Feng (University College London)
Zhaowei Zhang (Wuhan University)
XIAN HONG WU FUNG (Peking University)
Haifeng Zhang (Institute of automation, Chinese academy of science, Chinese Academy of Sciences)
Jun Wang (UCL)
Yali Du (King's College London)
Yaodong Yang (AIG)

More from the Same Authors