Skip to yearly menu bar Skip to main content


Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

Kevin Wang · Alexandre Variengien · Arthur Conmy · Buck Shlegeris · Jacob Steinhardt
[ Poster

Abstract

Chat is not available.