NeurIPS 2023

Workshop

Mining the Diamond Miner: Mechanistic Interpretability on the Video PreTraining Agent
Sonia Joseph · Artem Zholus · Mohammad Reza Samsami · Blake Richards

Workshop

Associative Memories with Heavy-Tailed Data
Vivien Cabannes · Elvis Dohmatob · Alberto Bietti

Workshop

Sat 14:07

Scale Alone Does not Improve Mechanistic Interpretability in Vision Models
Roland S. Zimmermann · Thomas Klein · Wieland Brendel

Poster

Thu 15:00

The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks
Ziqian Zhong · Ziming Liu · Max Tegmark · Jacob Andreas

Poster

Wed 8:45

Scale Alone Does not Improve Mechanistic Interpretability in Vision Models
Roland S. Zimmermann · Thomas Klein · Wieland Brendel

Poster

Tue 15:15

AI for Interpretable Chemistry: Predicting Radical Mechanistic Pathways via Contrastive Learning
Mohammadamin Tavakoli · Pierre Baldi · Ann Marie Carlton · Yin Ting Chiu · Alexander Shmakov · David Van Vranken

Workshop

Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism
Mansi Sakarvadia · Arham Khan · Aswathy Ajith · Daniel Grzenda · Nathaniel Hudson · André Bauer · Kyle Chard · Ian Foster

Poster

Tue 15:15

Towards Automated Circuit Discovery for Mechanistic Interpretability
Arthur Conmy · Augustine Mavor-Parker · Aengus Lynch · Stefan Heimersheim · Adrià Garriga-Alonso

Main Navigation