firstbacksecondback
10 Results
Workshop
|
Disclosing the Biases in Large Language Models via Reward Structured Questions Ezgi Korkmaz |
||
Workshop
|
Revealing the Bias in Large Language Models via Reward Structured Questions Ezgi Korkmaz |
||
Workshop
|
Revealing the Bias in Large Language Models via Reward Structured Questions Ezgi Korkmaz |
||
Poster
|
Thu 9:00 |
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting Tomasz Korbak · Hady Elsahar · Germán Kruszewski · Marc Dymetman |
|
Poster
|
Wed 14:00 |
Hedging as Reward Augmentation in Probabilistic Graphical Models Debarun Bhattacharjya · Radu Marinescu |
|
Poster
|
Tue 9:00 |
Fine-tuning language models to find agreement among humans with diverse preferences Michiel Bakker · Martin Chadwick · Hannah Sheahan · Michael Tessler · Lucy Campbell-Gillingham · Jan Balaguer · Nat McAleese · Amelia Glaese · John Aslanides · Matt Botvinick · Christopher Summerfield |
|
Poster
|
Thu 9:00 |
Defining and Characterizing Reward Gaming Joar Skalse · Nikolaus Howe · Dmitrii Krasheninnikov · David Krueger |
|
Poster
|
Wed 14:00 |
Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning Joseph Early · Tom Bewley · Christine Evers · Sarvapali Ramchurn |
|
Poster
|
Wed 9:00 |
Trade-off between Payoff and Model Rewards in Shapley-Fair Collaborative Machine Learning Quoc Phong Nguyen · Bryan Kian Hsiang Low · Patrick Jaillet |
|
Poster
|
Thu 9:00 |
Learning General World Models in a Handful of Reward-Free Deployments Yingchen Xu · Jack Parker-Holder · Aldo Pacchiano · Philip Ball · Oleh Rybkin · S Roberts · Tim Rocktäschel · Edward Grefenstette |