Skip to yearly menu bar Skip to main content


Poster Wed, Dec 3, 2025 • 11:00 AM – 2:00 PM PST

Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling

Jiahao Wang ⋅ Weiye Xu ⋅ Aijun Yang ⋅ Wengang Zhou ⋅ Lewei Lu ⋅ Houqiang Li ⋅ Xiaohua Wang ⋅ Jinguo Zhu

Abstract

Video

Chat is not available.