Skip to yearly menu bar Skip to main content


Reward Model Ensembles Help Mitigate Overoptimization

Thomas Coste · Usman Anwar · Robert Kirk · David Krueger

Abstract

Chat is not available.