Skip to yearly menu bar Skip to main content


Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning

Shuyao Xu ⋅ Cheng Peng ⋅ Jiangxuan Long ⋅ Weidi Xu ⋅ Wei Chu ⋅ Yuan Qi

Abstract

Chat is not available.