Skip to yearly menu bar Skip to main content


On the Rollout-Training Mismatch in Modern RL Systems

Feng Yao · Liyuan Liu · Dinghuai Zhang · Chengyu Dong · Jingbo Shang · Jianfeng Gao

Abstract

Chat is not available.