Skip to yearly menu bar Skip to main content


P3O: Pessimistic Preference-based Policy Optimization for Robust Alignment from Preferences

Dhawal Gupta ⋅ Christoph Dann ⋅ Alekh Agarwal

Abstract

Chat is not available.