Skip to yearly menu bar Skip to main content


Poster

Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?

Paul Gölz ⋅ Nika Haghtalab ⋅ Kunhe Yang
2025 Poster

Abstract

Video

Chat is not available.