Skip to yearly menu bar Skip to main content


Poster Fri, Dec 5, 2025 • 11:00 AM – 2:00 PM PST

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Jiarui Yao ⋅ Yifan Hao ⋅ Hanning Zhang ⋅ Hanze Dong ⋅ Wei Xiong ⋅ Nan Jiang ⋅ Tong Zhang

Abstract

Video

Chat is not available.