Skip to yearly menu bar Skip to main content


Efficient RL Training for Reasoning Models via Length-Aware Optimization

Danlong Yuan ⋅ Tian Xie ⋅ Shaohan Huang ⋅ Zhuocheng Gong ⋅ Huishuai Zhang ⋅ Chong Luo ⋅ Furu Wei ⋅ Dongyan Zhao

Abstract

Chat is not available.