firstbacksecondback
16 Results
Workshop
|
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF Heyang Zhao · Chenlu Ye · Quanquan Gu · Tong Zhang |
||
Workshop
|
Sat 12:00 |
Uncertainty-Penalized Directed Preference Optimization Sam Houliston · Alexander Immer · Alizée Pace · Gunnar Rätsch |
|
Workshop
|
Sat 15:45 |
Back-to-Basics Revisited: Benchmarking an Expanded Set of RLHF Algorithms Lucas Spangher · Rama Kumar Pasumarthi · Nick Masiewicki · Peter Grabowski · Eugene Ie · William Arnold · Daniele Calandriello · Bilal Piot |
|
Workshop
|
Understanding and Alleviating Memory Issue in RLHF for LLMs Jin Zhou · Hanmei Yang · Steven Jiaxun Tang · Mingcan Xiang · Hui Guan · Tongping Liu |