Skip to yearly menu bar Skip to main content

Workshop: UniReps: Unifying Representations in Neural Models

What Does Knowledge Distillation Distill?

Cindy Wu · Ekdeep S Lubana · Bruno Mlodozeniec · Robert Kirk · David Krueger

[ ] [ Project Page ]
presentation: UniReps: Unifying Representations in Neural Models
Fri 15 Dec 6:15 a.m. PST — 3:15 p.m. PST


Knowledge distillation is an increasingly-used compression method due to the popularity of large-scale models, but it is unclear if all information a teacher model contains is distilled into the smaller student model. We aim to formalize the concept of `knowledge' to investigate how knowledge is transferred during distillation, focusing on shared invariances to counterfactual changes to dataset latent variables (which we call mechanisms). We define good stand-in student models for the teacher as models that share the teacher's mechanisms, and find Jacobian matching and contrastive representation learning as viable methods to achieve good students. While these methods do not result in perfect transfer of mechanisms, they are likely to improve student fidelity or mitigate simplicity bias (as measured by teacher-student KL divergence and accuracy on various out-of-distribution test datasets), especially on datasets with certain spurious statistical correlations.

Chat is not available.