Skip to yearly menu bar Skip to main content


Talk
in
Workshop: Transfer Learning for Natural Language Processing

Fine-Tuning without Distortion: Improving Robustness to Distribution Shifts

Percy Liang · Ananya Kumar


Abstract:

Fine-tuning foundation models (such as BERT or CLIP) is one of the most successful ways to achieve high accuracy. But achieving high in-distribution accuracy is not enough: high-stakes applications such as self-driving cars, medical diagnosis, and poverty mapping, also require models that generalize to circumstances not seen in the fine-tuning distribution. To examine this, we also evaluate models on out-of-distribution (OOD) test data. We show that standard full fine-tuning of all the model’s parameters can distort pretrained information and underperform OOD. Instead, we explain why selectively tuning parts of the model (e.g., prefixes, linear probes, embedding layers) can preserve pretrained information and lead to better OOD performance. Our analysis suggests the easy two-step strategy of linear probing then full fine-tuning (LP-FT), which improves pretrained features without distortion, and leads to even higher accuracies. These works underscore the importance of preserving pretrained knowledge when using powerful pretrained models.

Chat is not available.