Skip to yearly menu bar Skip to main content


Reward Model Overoptimisation in Iterated RLHF

Lorenz Wolf · Robert Kirk · Mirco Musolesi

Abstract

Chat is not available.