Skip to yearly menu bar Skip to main content


Poster

InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling

Yuchun Miao · Sen Zhang · Liang Ding · Rong Bao · Lefei Zhang · Dacheng Tao
2024 Poster

Abstract

Video

Chat is not available.