Fri 6:45 a.m. - 7:00 a.m.
|
Welcome and Opening Remarks
(
Remarks
)
>
SlidesLive Video
|
馃敆
|
Fri 7:00 a.m. - 7:30 a.m.
|
Data attribution for LMMs and beyond (James Zou)
(
In-person presentation
)
>
SlidesLive Video
|
馃敆
|
Fri 7:30 a.m. - 8:00 a.m.
|
What does scale give us: Why we are building a ladder to the moon (Sara Hooker)
(
In-person presentation
)
>
SlidesLive Video
|
馃敆
|
Fri 8:00 a.m. - 8:30 a.m.
|
Coffee Break and Posters
|
馃敆
|
Fri 8:30 a.m. - 9:05 a.m.
|
Contributed papers (4 presentations)
(
Contributed Talk
)
>
SlidesLive Video
|
Elan Rosenfeld 路 Rhys Gould 路 Nicholas Konz 路 Theodora Worledge
馃敆
|
Fri 9:05 a.m. - 9:50 a.m.
|
The Future of Attribution in ML (Panel)
(
Discussion Panel
)
>
SlidesLive Video
|
馃敆
|
Fri 9:50 a.m. - 11:00 a.m.
|
Lunch
|
馃敆
|
Fri 11:00 a.m. - 12:00 p.m.
|
Poster Session #1
(
Poster Session
)
>
|
馃敆
|
Fri 12:00 p.m. - 12:30 p.m.
|
What Neural Networks Memorize and Why (Vitaly Feldman)
(
In-person presentation
)
>
SlidesLive Video
|
馃敆
|
Fri 12:30 p.m. - 1:00 p.m.
|
Evaluation Beyond Task Performance (Milad Nasr)
(
In-person presentation
)
>
SlidesLive Video
|
馃敆
|
Fri 1:00 p.m. - 2:00 p.m.
|
Poster Session #2
(
Poster Session
)
>
|
馃敆
|
Fri 1:00 p.m. - 1:30 p.m.
|
Coffee Break and Posters
|
馃敆
|
Fri 2:00 p.m. - 2:30 p.m.
|
Understanding LLMs via their Generative Successes and Shortcomings (Swabha Swayamdipta)
(
In-person presentation
)
>
SlidesLive Video
|
馃敆
|
Fri 2:30 p.m. - 3:00 p.m.
|
Talk by Sanjeev Arora
(
In-person presentation
)
>
SlidesLive Video
|
馃敆
|
Fri 3:00 p.m. - 3:30 p.m.
|
Poster Session #3 & Closing Remarks
(
Poster Session
)
>
|
馃敆
|
-
|
Irreducible Curriculum for Language Model Pretraining
(
Poster
)
>
link
|
Simin Fan 路 Martin Jaggi
馃敆
|
-
|
Evaluating the Utility of Model Explanations for Model Development
(
Poster
)
>
link
|
Shawn Im 路 Jacob Andreas 路 Yilun Zhou
馃敆
|
-
|
Why do landscape diagnostics matter? Pinpointing the failure mode of generalization
(
Poster
)
>
link
|
Yefan Zhou 路 Jianlong Chen 路 Qinxue Cao 路 Konstantin Sch眉rholt 路 Yaoqing Yang
馃敆
|
-
|
The Importance of Prompt Tuning for Automated Neuron Explanations
(
Poster
)
>
link
|
Justin Lee 路 Tuomas Oikarinen 路 Arjun Chatha 路 Keng-Chi Chang 路 Yilan Chen 路 Lily Weng
馃敆
|
-
|
Copy Suppression: Comprehensively Understanding an Attention Head
(
Poster
)
>
link
|
Callum McDougall 路 Arthur Conmy 路 Cody Rushing 路 Tom McGrath 路 Neel Nanda
馃敆
|
-
|
Does It Know?: Probing and Benchmarking Uncertainty in Language Model Latent Beliefs
(
Poster
)
>
link
|
Brian Huang 路 Joe Kwon
馃敆
|
-
|
Attribution Patching Outperforms Automated Circuit Discovery
(
Poster
)
>
link
|
Aaquib Syed 路 Can Rager 路 Arthur Conmy
馃敆
|
-
|
On the Support Vector Effect in DNNs: Rethinking Last Layer Sensitivity-based Instance Attribution
(
Poster
)
>
link
|
Syed Hasan Amin Mahmood 路 Rajiv Khanna
馃敆
|
-
|
Training Dynamics of Contextual N-Grams in Language Models
(
Poster
)
>
link
|
Lucia Quirke 路 Lovis Heindrich 路 Wes Gurnee 路 Neel Nanda
馃敆
|
-
|
SPADE: Sparsity-Guided Debugging for Deep Neural Networks
(
Poster
)
>
link
|
Arshia Soltani Moakhar 路 Eugenia Iofinova 路 Dan Alistarh
馃敆
|
-
|
In Search of a Data Transformation that Accelerates Neural Field Training
(
Poster
)
>
link
SlidesLive Video
|
Junwon Seo 路 Sangyoon Lee 路 Jaeho Lee
馃敆
|
-
|
Automatic Discovery of Visual Circuits
(
Poster
)
>
link
|
Achyuta Rajaram 路 Neil Chowdhury 路 Antonio Torralba 路 Jacob Andreas 路 Sarah Schwettmann
馃敆
|
-
|
Mining the Diamond Miner: Mechanistic Interpretability on the Video PreTraining Agent
(
Poster
)
>
link
|
Sonia Joseph 路 Artem Zholus 路 Mohammad Reza Samsami 路 Blake Richards
馃敆
|
-
|
Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to Data Valuation (Workshop Version)
(
Poster
)
>
link
|
Jiachen (Tianhao) Wang 路 Yuqing Zhu 路 Yu-Xiang Wang 路 Ruoxi Jia 路 Prateek Mittal
馃敆
|
-
|
Colour versus Shape Goal Misgeneralization in Reinforcement Learning: A Case Study
(
Poster
)
>
link
SlidesLive Video
|
Karolis Ramanauskas 路 脰zg眉r 艦im艧ek
馃敆
|
-
|
Adversarial Attacks on Neuron Interpretation via Activation Maximization
(
Poster
)
>
link
|
Alex Fulleringer 路 Geraldin Nanfack 路 Jonathan Marty 路 Michael Eickenberg 路 Eugene Belilovsky
馃敆
|
-
|
Divergence at the Interpolation Threshold: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle
(
Poster
)
>
link
SlidesLive Video
|
Rylan Schaeffer 路 Zachary Robertson 路 Akhilan Boopathy 路 Mikail Khona 路 Ila Fiete 路 Andrey Gromov 路 Sanmi Koyejo
馃敆
|
-
|
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
(
Poster
)
>
link
|
Lukas Berglund 路 Meg Tong 路 Maximilian Kaufmann 路 Mikita Balesni 路 Asa Cooper Stickland 路 Tomasz Korbak 路 Owain Evans
馃敆
|
-
|
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
(
Poster
)
>
link
|
Samuel Marks 路 Max Tegmark
馃敆
|
-
|
Language Models Linearly Represent Sentiment
(
Poster
)
>
link
|
Curt Tigges 路 Oskar John Hollinsworth 路 Atticus Geiger 路 Neel Nanda
馃敆
|
-
|
Efficient Data Valuation for Weighted Nearest Neighbor Algorithms
(
Poster
)
>
link
|
Jiachen (Tianhao) Wang 路 Ruoxi Jia
馃敆
|
-
|
How do language models bind entities in context?
(
Poster
)
>
link
|
Jiahai Feng 路 Jacob Steinhardt
馃敆
|
-
|
Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching
(
Poster
)
>
link
|
Aleksandar Makelov 路 Georg Lange 路 Atticus Geiger 路 Neel Nanda
馃敆
|
-
|
Object Detection in Deep Neural Networks Differs from Humans in the Periphery
(
Poster
)
>
link
|
Anne Harrington 路 Vasha DuTell 路 Mark Hamilton 路 Ayush Tewari 路 Simon Stent 路 Bill Freeman 路 Ruth Rosenholtz
馃敆
|
-
|
Risk Aversion of Online Learning Algorithms
(
Poster
)
>
link
|
Andreas Haupt 路 Aroon Narayanan
馃敆
|
-
|
Tell, Don't Show: Internalized Reasoning influences how LLMs generalize
(
Poster
)
>
link
|
Alexander Meinke 路 Owain Evans
馃敆
|
-
|
Formal Definition of Fingerprints Improves Attribution of Generative Models
(
Poster
)
>
link
|
Hae Jin Song 路 Mahyar Khayatkhoei 路 Wael Abd-Almageed
馃敆
|
-
|
Attributing Learned Concepts in Neural Networks to Training Data
(
Oral
)
>
link
|
Nicholas Konz 路 Charles Godfrey 路 Madelyn Shapiro 路 Jonathan Tu 路 Henry Kvinge 路 Davis Brown
馃敆
|
-
|
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
(
Poster
)
>
link
|
Max Marion 路 Ahmet 脺st眉n 路 Luiza A Pozzobon 路 Alex Wang 路 Marzieh Fadaee 路 Sara Hooker
馃敆
|
-
|
A Simple and Efficient Baseline for Data Attribution on Images
(
Poster
)
>
link
|
Vasu Singla 路 Pedro Sandoval-Segura 路 Micah Goldblum 路 Jonas Geiping 路 Tom Goldstein
馃敆
|
-
|
Shapley Interactions for Complex Feature Attribution
(
Poster
)
>
link
|
Divyansh Singhvi 路 Andrej Erkelens 路 Raghav Jain 路 Diganta Misra 路 Naomi Saphra
馃敆
|
-
|
Sparse Autoencoders Find Highly Interpretable Features in Language Models
(
Poster
)
>
link
|
Hoagy Cunningham 路 Aidan Ewart 路 Logan Smith 路 Robert Huben 路 Lee Sharkey
馃敆
|
-
|
Successor Heads: Recurring, Interpretable Attention Heads In The Wild
(
Oral
)
>
link
|
Rhys Gould 路 Euan Ong 路 George Ogden 路 Arthur Conmy
馃敆
|
-
|
Exploring Dataset-Scale Indicators of Data Quality
(
Poster
)
>
link
|
Benjamin Feuer 路 Chinmay Hegde
馃敆
|
-
|
Self-Select: Optimizing Instruction Selection for Large Language Models
(
Poster
)
>
link
|
Alexander Kyimpopkin 路 Keshav Ramji
馃敆
|
-
|
Speculative Behavior: An Approach to Large Language Model Evaluation and Optimization
(
Poster
)
>
link
SlidesLive Video
|
Hernan C. Vazquez 路 Jorge S谩nchez 路 Rafael Carrascosa
馃敆
|
-
|
Unifying Corroborative and Contributive Attributions in Large Language Models
(
Oral
)
>
link
|
Theodora Worledge 路 Judy Hanwen Shen 路 Nicole Meister 路 Caleb Winston 路 Carlos Guestrin
馃敆
|
-
|
Algorithm Selection with Priority Order for Instances
(
Poster
)
>
link
|
Zhamilya Saparova 路 Martin Lukac
馃敆
|
-
|
Better than Balancing: Debiasing through Data Attribution
(
Poster
)
>
link
|
Saachi Jain 路 Kimia Hamidieh 路 Kristian Georgiev 路 Marzyeh Ghassemi 路 Aleksander Madry
馃敆
|
-
|
Prototype Generation: Robust Feature Visualisation for Data Independent Interpretability
(
Poster
)
>
link
|
Arush Tagade 路 Jessica Rumbelow
馃敆
|
-
|
Backtracking Mathematical Reasoning of Language Models to the Pretraining Data
(
Poster
)
>
link
|
Yasaman Razeghi 路 Hamish Ivison 路 Sameer Singh 路 Yanai Elazar
馃敆
|
-
|
Intriguing Properties of Data Attribution on Diffusion Models
(
Poster
)
>
link
|
Xiaosen Zheng 路 Tianyu Pang 路 Chao Du 路 Jing Jiang 路 Min Lin
馃敆
|
-
|
Forbidden Facts: An Investigation of Competing Objectives in Llama 2
(
Poster
)
>
link
SlidesLive Video
|
Tony Wang 路 Miles Wang 路 Kaivalya Hariharan 路 Nir Shavit
馃敆
|
-
|
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
(
Poster
)
>
link
|
Fred Zhang 路 Neel Nanda
馃敆
|
-
|
Meta- (out-of-context) learning in neural networks
(
Poster
)
>
link
|
Dmitrii Krasheninnikov 路 Egor Krasheninnikov 路 Bruno Mlodozeniec 路 David Krueger
馃敆
|
-
|
Transformer-based Causal Language Models from a Meta-Learning Perspective
(
Poster
)
>
link
|
Xinbo Wu 路 Lav Varshney
馃敆
|
-
|
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
(
Oral
)
>
link
|
Elan Rosenfeld 路 Andrej Risteski
馃敆
|
-
|
Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism
(
Poster
)
>
link
|
Mansi Sakarvadia 路 Arham Khan 路 Aswathy Ajith 路 Daniel Grzenda 路 Nathaniel Hudson 路 Andr茅 Bauer 路 Kyle Chard 路 Ian Foster
馃敆
|
-
|
Estimating the Generalization in Deep Neural Networks via Sparsity
(
Poster
)
>
link
|
Yang Zhao 路 Hao Zhang 路 Xiuyuan Hu
馃敆
|
-
|
Data Attribution for Segmentation Models
(
Poster
)
>
link
|
Albert Tam 路 Joshua Vendrow 路 Aleksander Madry
馃敆
|
-
|
Summing Up the Facts: Additive Mechanisms behind Factual Recall in LLMs
(
Poster
)
>
link
|
Bilal Chughtai 路 Alan Cooney 路 Neel Nanda
馃敆
|