Skip to yearly menu bar Skip to main content


Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

Kevin Wang ⋅ Alexandre Variengien ⋅ Arthur Conmy ⋅ Buck Shlegeris ⋅ Jacob Steinhardt
[ Poster

Abstract

Chat is not available.