Skip to yearly menu bar Skip to main content


Poster

Compact Proofs of Model Performance via Mechanistic Interpretability

Jason Gross · Rajashree Agrawal · Thomas Kwa · Euan Ong · Chun Hei Yip · Alex Gibson · Soufiane Noubir · Lawrence Chan
2024 Poster

Abstract

Video

Chat is not available.