firstbacksecondback
11 Results
Workshop
|
Let's Reinforce Step by Step Sarah Pan · Vladislav Lialin · Sherin Muckatira · Anna Rumshisky |
||
Workshop
|
Fri 12:50 |
#28: Canonical Design for Language Agents using Natural Language Reward Models Silviu Pitis · Ziang Xiao · Alessandro Sordoni |
|
Workshop
|
Confronting Reward Model Overoptimization with Constrained RLHF Ted Moskovitz · Aaditya Singh · DJ Strouse · Tuomas Sandholm · Russ Salakhutdinov · Anca Dragan · Stephen McAleer |
||
Workshop
|
Confronting Reward Model Overoptimization with Constrained RLHF Ted Moskovitz · Aaditya Singh · DJ Strouse · Tuomas Sandholm · Russ Salakhutdinov · Anca Dragan · Stephen McAleer |
||
Workshop
|
Understanding Hidden Context in Preference Learning: Consequences for RLHF Anand Siththaranajn · Cassidy Laidlaw · Dylan Hadfield-Menell |
||
Workshop
|
Understanding Hidden Context in Preference Learning: Consequences for RLHF Anand Siththaranajn · Cassidy Laidlaw · Dylan Hadfield-Menell |
||
Workshop
|
Delve into PPO: Implementation Matters for Stable RLHF Rui Zheng · Shihan Dou · Songyang Gao · Yuan Hua · Wei Shen · Binghai Wang · Yan Liu · Senjie Jin · Yuhao Zhou · Limao Xiong · Lu Chen · Zhiheng Xi · Nuo Xu · Wenbin Lai · Minghao Zhu · Haoran Huang · Tao Gui · Qi Zhang · Xuanjing Huang |
||
Workshop
|
Understanding the Effects of RLHF on LLM Generalisation and Diversity Robert Kirk · Ishita Mediratta · Christoforos Nalmpantis · Jelena Luketina · Eric Hambro · Edward Grefenstette · Roberta Raileanu |
||
Workshop
|
Reward Model Ensembles Help Mitigate Overoptimization Thomas Coste · Usman Anwar · Robert Kirk · David Krueger |
||
Workshop
|
Diversity from Human Feedback Ren-Jian Wang · Ke Xue · Yutong Wang · Peng Yang · Haobo Fu · Qiang Fu · Chao Qian |
||
Poster
|
Thu 15:00 |
Is RLHF More Difficult than Standard RL? A Theoretical Perspective Yuanhao Wang · Qinghua Liu · Chi Jin |