Skip to yearly menu bar Skip to main content

Workshop: Workshop on Distribution Shifts: New Frontiers with Foundation Models

Towards Calibrated Robust Fine-Tuning of Vision-Language Models

Changdae Oh · Mijoo Kim · Hyesu Lim · Junhyeok Park · Euiseog Jeong · Zhi-Qi Cheng · Kyungwoo Song


While fine-tuning unleashes the potential of a pre-trained model to a specific task, it trades off the model’s generalization capability on out-of-distribution (OOD) datasets. To mitigate this, robust fine-tuning aims to ensure performance on OOD datasets as well as an in-distribution (ID) dataset for which the model is tuned. However, another criterion for reliable machine learning (ML) – confidence calibration, is overlooked despite its increasing demand for real-world high-stakes ML applications (e.g. autonomous driving). For the first, we raise concerns about the calibration of fine-tuned vision-language models (VLMs) by showing that naive fine-tuning and even state-of-the-art robust fine-tuning methods hurt the calibration of pre-trained VLMs, especially on OOD datasets. To address this, we provide a simple approach, called a calibrated robust fine-tuning (CaRot), that incentivizes the calibration and robustness on both ID and OOD datasets. Empirical results on ImageNet-1K distribution shift evaluation verify the effectiveness of our method.

Chat is not available.