Skip to yearly menu bar Skip to main content


Poster

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models

Adam Karvonen · Benjamin Wright · Can Rager · Rico Angell · Jannik Brinkmann · Logan Smith · Claudio Mayrink Verdun · David Bau · Samuel Marks
2024 Poster

Abstract

Video

Chat is not available.