firstbacksecondback
14 Results
Poster
|
Fri 11:00 |
Group Robust Preference Optimization in Reward-free RLHF Shyam Sundhar Ramesh · Yifan Hu · Iason Chaimalas · Viraj Mehta · Pier Giuseppe Sessa · Haitham Bou Ammar · Ilija Bogunovic |
|
Workshop
|
Generative Verifiers: Reward Modeling as Next-Token Prediction Lunjun Zhang · Arian Hosseini · Hritik Bansal · Mehran Kazemi · Aviral Kumar · Rishabh Agarwal |