Skip to yearly menu bar Skip to main content


Poster

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Zhihan Liu ⋅ Miao Lu ⋅ Shenao Zhang ⋅ Boyi Liu ⋅ Hongyi Guo ⋅ Yingxiang Yang ⋅ Jose Blanchet ⋅ Zhaoran Wang
2024 Poster

Abstract

Video

Chat is not available.