Skip to yearly menu bar Skip to main content


Confronting Reward Model Overoptimization with Constrained RLHF

Ted Moskovitz · Aaditya Singh · DJ Strouse · Tuomas Sandholm · Russ Salakhutdinov · Anca Dragan · Stephen McAleer

Abstract

Chat is not available.