Skip to yearly menu bar Skip to main content


Reward Model Overoptimisation in Iterated RLHF

Lorenz Wolf ⋅ Robert Kirk ⋅ Mirco Musolesi

Abstract

Chat is not available.