Skip to yearly menu bar Skip to main content


Learning a Pessimistic Reward in RLHF: KL Regularization is Not Necessary

Yinglun Xu ⋅ Hangoo Kang ⋅ Tarun Suresh ⋅ Yuxuan Wan ⋅ Gagandeep Singh

Abstract

Chat is not available.