Skip to yearly menu bar Skip to main content


Poster

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Jiarui Yao ⋅ Yifan Hao ⋅ Hanning Zhang ⋅ Hanze Dong ⋅ Wei Xiong ⋅ Nan Jiang ⋅ Tong Zhang
2025 Poster

Abstract

Video

Chat is not available.