Skip to yearly menu bar Skip to main content


Workshop

Attributing Model Behavior at Scale (ATTRIB)

Elisa Nguyen · Sadhika Malladi · Andrew Ilyas · Logan Engstrom · Sam Park · Tolga Bolukbasi

Meeting 205 - 207

Sat 14 Dec, 8:15 a.m. PST

Recently-developed algorithmic innovations (e.g., transformers, diffusion models , state-space models) and large-scale datasets (e.g., Common Crawl, LAION) have given rise to machine learning models with impressive capabilities. As the cost of training such large models grows, and as systems based on them are used widely, it is increasingly important to understand how different design choices combine to induce observed behaviors. For example, we still do not fully understand how the composition of training datasets influences model behavior (e.g., how does training on code data affect reasoning capabilities in other domains?), how to attribute capabilities to subcomponents (e.g., can we identify which subnetwork of an LLM implements addition), and which algorithmic choices really drive performance (e.g., how can we best align models to human preferences?). Behavioral attribution is also important in light of recent concerns about harmful model behavior and several works suggest that these behaviors can be attributed to training data or model architecture and size.The core challenge in all of these questions is that of model behavior attribution.That is, the question of relating model behavior back to factors in the machine learning pipeline---such as the choice of training dataset or particular training algorithm---that produced this model. This workshop aims to bring together researchers and practitioners that advance our understanding of model behavior attribution in the contexts that span data, model understanding, and algorithmic interventions.

Live content is unavailable. Log in and register to view live content