Skip to yearly menu bar Skip to main content


Poster

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Zhihan Liu · Miao Lu · Shenao Zhang · Boyi Liu · Hongyi Guo · Yingxiang Yang · Jose Blanchet · Zhaoran Wang
2024 Poster

Abstract

Video

Chat is not available.