Skip to yearly menu bar Skip to main content

Workshop: Workshop on Distribution Shifts: Connecting Methods and Applications

A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias

Puja Trivedi · Danai Koutra · Jayaraman Thiagarajan


In order to achieve strong in-distribution (ID) and out-of-distribution (OOD) generalization during transfer learning, it was recently argued that adaptation protocols should better leverage the expressivity of high-quality, pretrained models by controlling feature distortion (FD), i.e., the failure to update features orthogonal to the ID. However, in addition to OOD generalization, practical applications require that adapted models are also safe. To this end, we study the susceptibility of common adaptation protocols to simplicity bias (SB), i.e., the well-known propensity of neural networks to rely upon simple features, as this phenomenon has recently been shown to underlie several problems in safe generalization. Using a controllable, synthetic setting, we demonstrate that solely controlling FD is not sufficient to avoid SB, harming in safe generalization. Given the need to control both SB and FD for improved safety and ID/OOD generalization, we propose modifying a recently proposed protocol with goal of reducing SB. We verify the effectiveness of these modified protocols in decreasing SB on synthetic setting, and in jointly improving OOD generalization and safety on standard adaptation benchmarks.

Chat is not available.