Skip to yearly menu bar Skip to main content


Oral

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafael Rafailov · Archit Sharma · Eric Mitchell · Christopher D Manning · Stefano Ermon · Chelsea Finn
2023 Oral

Abstract

Video

Chat is not available.