Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

7 Results

<<   <   Page 1 of 1   >>   >
Poster
Wed 16:30 A theoretical case-study of Scalable Oversight in Hierarchical Reinforcement Learning
Tom Yan · Zachary Lipton
Poster
Wed 16:30 On scalable oversight with weak LLMs judging strong LLMs
Zachary Kenton · Noah Siegel · Janos Kramar · Jonah Brown-Cohen · Samuel Albanie · Jannis Bulian · Rishabh Agarwal · David Lindner · Yunhao Tang · Noah Goodman · Rohin Shah
Workshop
Algorithmic Oversight for Deceptive Reasoning
Ege Onur Taga · Mingchen Li · Yongqi Chen · Samet Oymak
Workshop
Activation Monitoring: Advantages of Using Internal Representations for LLM Oversight
Oam Patel · Rowan Wang
Workshop
Algorithmic Oversight for Deceptive Reasoning
Ege Onur Taga · Mingchen Li · Yongqi Chen · Samet Oymak
Workshop
Algorithmic Oversight for Deceptive Reasoning
Ege Onur Taga · Mingchen Li · Yongqi Chen · Samet Oymak
Workshop
Modelling the oversight of deceptive interpretability agents
Simon Lermen · Mateusz Dziemian