Skip to yearly menu bar Skip to main content


Confronting Reward Model Overoptimization with Constrained RLHF

Ted Moskovitz ⋅ Aaditya Singh ⋅ DJ Strouse ⋅ Tuomas Sandholm ⋅ Russ Salakhutdinov ⋅ Anca Dragan ⋅ Stephen McAleer

Abstract

Chat is not available.