Timezone: »

Fine-Tuning without Distortion: Improving Robustness to Distribution Shifts
Percy Liang · Ananya Kumar

Sat Dec 03 08:30 AM -- 09:15 AM (PST) @

Fine-tuning foundation models (such as BERT or CLIP) is one of the most successful ways to achieve high accuracy. But achieving high in-distribution accuracy is not enough: high-stakes applications such as self-driving cars, medical diagnosis, and poverty mapping, also require models that generalize to circumstances not seen in the fine-tuning distribution. To examine this, we also evaluate models on out-of-distribution (OOD) test data. We show that standard full fine-tuning of all the model’s parameters can distort pretrained information and underperform OOD. Instead, we explain why selectively tuning parts of the model (e.g., prefixes, linear probes, embedding layers) can preserve pretrained information and lead to better OOD performance. Our analysis suggests the easy two-step strategy of linear probing then full fine-tuning (LP-FT), which improves pretrained features without distortion, and leads to even higher accuracies. These works underscore the importance of preserving pretrained knowledge when using powerful pretrained models.

Author Information

Percy Liang (Stanford University)
Ananya Kumar (Stanford University)

More from the Same Authors