Skip to yearly menu bar Skip to main content


Optimizing Reward Models with Proximal Policy Exploration in Preference-Based Reinforcement Learning

Yiwen Zhu · Jinyi Liu · Yifu Yuan · Wenya Wei · Zhenxing Ge · qianyi fu · Zhou Fang · Yujing Hu · Bo An

Abstract

Chat is not available.