firstbacksecondback
35 Results
Workshop
|
Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically Kefan Dong · Arvind Mahankali · Tengyu Ma |
||
Workshop
|
Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack Leo McKee-Reid · Christoph Sträter · Maria Martinez · Joe Needham · Mikita Balesni |
||
Poster
|
Thu 11:00 |
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs Rui Yang · Ruomeng Ding · Yong Lin · Huan Zhang · Tong Zhang |
|
Workshop
|
Improving LLM Generation with Inverse and Forward Alignment: Reward Modeling, Prompting, Fine-Tuning, and Inference-Time Optimization Hao Sun · Thomas Pouplin · Nicolás Astorga · Tennison Liu · Mihaela van der Schaar |
||
Workshop
|
Improving LLM Generation with Inverse and Forward Alignment: Reward Modeling, Prompting, Fine-Tuning, and Inference-Time Optimization Hao Sun · Thomas Pouplin · Nicolás Astorga · Tennison Liu · Mihaela van der Schaar |
||
Workshop
|
Mechanism Design for LLM Fine-tuning with Multiple Reward Models Haoran Sun · Yurong Chen · Siwei Wang · Wei Chen · Xiaotie Deng |
||
Poster
|
Thu 11:00 |
Learning Goal-Conditioned Representations for Language Reward Models Vaskar Nath · Dylan Slack · Jeff Da · Yuntao Ma · Hugh Zhang · Spencer Whitehead · Sean Hendryx |
|
Poster
|
Thu 16:30 |
Calibrated Self-Rewarding Vision Language Models Yiyang Zhou · Zhiyuan Fan · Dongjie Cheng · Sihan Yang · Zhaorun Chen · Chenhang Cui · Xiyao Wang · Yun Li · Linjun Zhang · Huaxiu Yao |
|
Workshop
|
Linear Probe Penalties Reduce LLM Sycophancy Henry Papadatos · Rachel Freedman |
||
Workshop
|
Critique-out-Loud Reward Models Zachary Ankner · Mansheej Paul · Brandon Cui · Jonathan Chang · Prithviraj Ammanabrolu |
||
Workshop
|
S2L-RM: Short-to-Long Reward Modeling Changyu CHEN · Zichen Liu · Haonan Wang · Chao Du · Tianyu Pang · Qian Liu · Arunesh Sinha · Pradeep Varakantham · Min Lin |
||
Poster
|
Fri 11:00 |
Group Robust Preference Optimization in Reward-free RLHF Shyam Sundhar Ramesh · Yifan Hu · Iason Chaimalas · Viraj Mehta · Pier Giuseppe Sessa · Haitham Bou Ammar · Ilija Bogunovic |