Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Safe Generative AI

Anchored Optimization and Contrastive Revisions: Addressing Reward Hacking in Alignment

Karel Doosterlinck · Winnie Xu · Chris Develder · Thomas Demeester · Amanpreet Singh · Christopher Potts · Douwe Kiela · Shikib Mehri

Abstract

Chat is not available.