Timezone: »

An Empirical Study of Pre-trained Models on Out-of-distribution Generalization
Yaodong Yu · Heinrich Jiang · Dara Bahri · Hossein Mobahi · Seungyeon Kim · Ankit Rawat · Andreas Veit · Yi Ma
Event URL: https://openreview.net/forum?id=z-LBrGmZaNs »

Generalizing to out-of-distribution (OOD) data -- that is, data from domains unseen during training -- is a key challenge in modern machine learning, which has only recently received much attention. Some existing approaches propose leveraging larger models and pre-training on larger datasets. In this paper, we provide new insights in applying these approaches. Concretely, we show that larger models and larger datasets need to be \emph{simultaneously} leveraged to improve OOD performance. Moreover, we show that using smaller learning rates during fine-tuning is critical to achieving good results, contrary to popular intuition that larger learning rates generalize better when training from scratch. We show that strategies that improve in-distribution accuracy may, counter-intuitively, lead to poor OOD performance despite strong in-distribution performance. Our insights culminate to a method that achieves state-of-the-art results on a number of OOD generalization benchmark tasks, often by a significant margin.

Author Information

Yaodong Yu (University of California, Berkeley)
Heinrich Jiang (Google Research)
Dara Bahri (Google AI)
Hossein Mobahi (Google Research)
Seungyeon Kim (Google Research)
Ankit Rawat (University of Texas at Austin)
Andreas Veit (Google)
Yi Ma (UC Berkeley)

More from the Same Authors