Skip to yearly menu bar Skip to main content


Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead

Feiyang Kang · Michael Kuchnik · Karthik Padthe · Marin Vlastelica · Ruoxi Jia · Carole-Jean Wu · Newsha Ardalani

Abstract

Chat is not available.