firstbacksecondback
7 Results
Poster
|
Wed 16:30 |
A theoretical case-study of Scalable Oversight in Hierarchical Reinforcement Learning Tom Yan · Zachary Lipton |
|
Poster
|
Wed 16:30 |
On scalable oversight with weak LLMs judging strong LLMs Zachary Kenton · Noah Siegel · Janos Kramar · Jonah Brown-Cohen · Samuel Albanie · Jannis Bulian · Rishabh Agarwal · David Lindner · Yunhao Tang · Noah Goodman · Rohin Shah |
|
Workshop
|
Algorithmic Oversight for Deceptive Reasoning Ege Onur Taga · Mingchen Li · Yongqi Chen · Samet Oymak |
||
Workshop
|
Activation Monitoring: Advantages of Using Internal Representations for LLM Oversight Oam Patel · Rowan Wang |
||
Workshop
|
Algorithmic Oversight for Deceptive Reasoning Ege Onur Taga · Mingchen Li · Yongqi Chen · Samet Oymak |
||
Workshop
|
Algorithmic Oversight for Deceptive Reasoning Ege Onur Taga · Mingchen Li · Yongqi Chen · Samet Oymak |
||
Workshop
|
Modelling the oversight of deceptive interpretability agents Simon Lermen · Mateusz Dziemian |