NeurIPS 2023
Skip to yearly menu bar Skip to main content


Attributing Model Behavior at Scale (ATTRIB)

Tolga Bolukbasi · Logan Engstrom · Kelvin Guu · Andrew Ilyas · Sam Park · Ellie Pavlick · Anders Søgaard

Room 271 - 273
[ Abstract ] Workshop Website
Fri 15 Dec, 6:45 a.m. PST

Recently-developed algorithmic innovations (e.g., transformers, diffusion models ) and large-scale datasets (e.g., Common Crawl, LAION) have given rise to machine learning models with impressive capabilities. However, there is much left to understand in how these different factors combine to give rise to observed behaviors. For example, we still do not fully understand how the composition of training datasets influence downstream model capabilities (e.g., which data sources within LAION-5B are important for training high-quality CLIP embeddings?), how to attribute model capabilities to subcomponents inside the model(e.g., can we identify which subnetwork of a LLM implements addition ?), and which algorithmic choices really drive performance (e.g., is RL necessary to align language models?).A common theme underlying all these challenges is model behavior attribution. That is, the need to tie model behavior back to factors in the machine learning pipeline---such as the choice of training dataset or particular training algorithm---that we can control or reason about. This workshop aims to bring together researchers and practitioners that advance our understanding of model behavior attribution in the contexts that span: data, models, and learning algorithms.

Chat is not available.
Timezone: America/Los_Angeles