Skip to yearly menu bar Skip to main content


Learning a Pessimistic Reward in RLHF: KL Regularization is Not Necessary

Yinglun Xu · Hangoo Kang · Tarun Suresh · Yuxuan Wan · Gagandeep Singh

Abstract

Chat is not available.