Skip to yearly menu bar Skip to main content


Poster

Pre-Trained Policy Discriminators are General Reward Models

Shihan Dou ⋅ Shichun Liu ⋅ Yuming Yang ⋅ Yicheng Zou ⋅ Yunhua Zhou ⋅ Shuhao Xing ⋅ Chenhao Huang ⋅ Qiming Ge ⋅ haijun Lv ⋅ Demin Song ⋅ Songyang Gao ⋅ Chengqi Lyu ⋅ Enyu Zhou ⋅ Honglin Guo ⋅ Zhiheng Xi ⋅ Qipeng Guo ⋅ Wenwei Zhang ⋅ Tao Gui ⋅ Qi Zhang ⋅ Xipeng Qiu ⋅ Xuanjing Huang ⋅ Kai Chen
2025 Poster

Abstract

Video

Chat is not available.