Skip to yearly menu bar Skip to main content


P3O: Pessimistic Preference-based Policy Optimization for Robust Alignment from Preferences

Dhawal Gupta · Christoph Dann · Alekh Agarwal

Abstract

Chat is not available.