Skip to yearly menu bar Skip to main content


Oral Poster

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafael Rafailov ⋅ Archit Sharma ⋅ Eric Mitchell ⋅ Christopher D Manning ⋅ Stefano Ermon ⋅ Chelsea Finn
Outstanding Paper Runner-up Outstanding Paper Runner-up
2023 Oral Poster

Abstract

Video

Chat is not available.