Skip to yearly menu bar Skip to main content


Poster

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

Hamish Ivison · Yizhong Wang · Jiacheng Liu · Zeqiu Wu · Valentina Pyatkin · Nathan Lambert · Noah Smith · Yejin Choi · Hanna Hajishirzi
2024 Poster

Abstract

Video

Chat is not available.