Skip to yearly menu bar Skip to main content


Search All 2023 Events
 

11 Results

<<   <   Page 1 of 1   >>   >
Workshop
Let's Reinforce Step by Step
Sarah Pan · Vladislav Lialin · Sherin Muckatira · Anna Rumshisky
Workshop
Fri 12:50 #28: Canonical Design for Language Agents using Natural Language Reward Models
Silviu Pitis · Ziang Xiao · Alessandro Sordoni
Workshop
Confronting Reward Model Overoptimization with Constrained RLHF
Ted Moskovitz · Aaditya Singh · DJ Strouse · Tuomas Sandholm · Russ Salakhutdinov · Anca Dragan · Stephen McAleer
Workshop
Confronting Reward Model Overoptimization with Constrained RLHF
Ted Moskovitz · Aaditya Singh · DJ Strouse · Tuomas Sandholm · Russ Salakhutdinov · Anca Dragan · Stephen McAleer
Workshop
Understanding Hidden Context in Preference Learning: Consequences for RLHF
Anand Siththaranajn · Cassidy Laidlaw · Dylan Hadfield-Menell
Workshop
Understanding Hidden Context in Preference Learning: Consequences for RLHF
Anand Siththaranajn · Cassidy Laidlaw · Dylan Hadfield-Menell
Workshop
Delve into PPO: Implementation Matters for Stable RLHF
Rui Zheng · Shihan Dou · Songyang Gao · Yuan Hua · Wei Shen · Binghai Wang · Yan Liu · Senjie Jin · Yuhao Zhou · Limao Xiong · Lu Chen · Zhiheng Xi · Nuo Xu · Wenbin Lai · Minghao Zhu · Haoran Huang · Tao Gui · Qi Zhang · Xuanjing Huang
Workshop
Understanding the Effects of RLHF on LLM Generalisation and Diversity
Robert Kirk · Ishita Mediratta · Christoforos Nalmpantis · Jelena Luketina · Eric Hambro · Edward Grefenstette · Roberta Raileanu
Workshop
Reward Model Ensembles Help Mitigate Overoptimization
Thomas Coste · Usman Anwar · Robert Kirk · David Krueger
Workshop
Diversity from Human Feedback
Ren-Jian Wang · Ke Xue · Yutong Wang · Peng Yang · Haobo Fu · Qiang Fu · Chao Qian
Poster
Thu 15:00 Is RLHF More Difficult than Standard RL? A Theoretical Perspective
Yuanhao Wang · Qinghua Liu · Chi Jin