Poster
Test-Time Adaptation Induces Stronger Accuracy and Agreement-on-the-Line
Eungyeup Kim · Mingjie Sun · Christina Baek · Aditi Raghunathan · J. Zico Kolter
East Exhibit Hall A-C #4501
Recently, Miller et al. (2021) and Baek et al. (2022) empirically demonstrated strong linear correlations between in-distribution (ID) versus out-of-distribution (OOD) accuracy and agreement. These trends, coined accuracy-on-the-line (ACL) and agreement-on-the-line (AGL), enables OOD model selection and performance estimation without labeled data. However, these phenomena also break for certain shifts, such as CIFAR10-C Gaussian Noise, posing a critical bottleneck. In this paper, we make a key finding that recent test-time adaptation (TTA) methods not only improve OOD performance, but drastically strengthens the ACL and AGL trends in models, even in shifts where models showed very weak correlations before. To analyze this, we revisit the theoretical conditions established by Miller et al. (2021), which demonstrate that ACL appears if the distributions only shift in mean and covariance scale in Gaussian data. We find that these theoretical conditions hold when deep networks are adapted to OOD, e.g., CIFAR10-C --- models embed the initial data distribution, with complex shifts, into those only with a singular ``scaling'' variable in the feature space. Building on these stronger linear trends, we demonstrate that combining TTA and AGL-based methods can predict the OOD performance with high precision for a broader set of distribution shifts. Furthermore, we can leverage ACL and AGL to perform hyperparameter search and select the best adaptation strategy without any OOD labeled data.
Live content is unavailable. Log in and register to view live content