On the Potential of the Four-Point Model for Studying the Role of Optimization in Robustness to Spurious Correlations
Abstract
Theoretical progress has recently been made in understanding how machine learning models develop reliance on spurious correlations. While empirical findings highlight the influence of stochastic gradient descent (SGD) and its optimization hyperparameters on this behavior, a grounded theoretical explanation remains lacking. Existing theories provide limited justification and fail to account for these phenomena. In this work, we revisit the four-points framework, a widely used theoretical tool for analyzing spurious correlations, to investigate how batch size affects the learning speeds of invariant features in the presence of spurious correlations. Our results show that the framework can account for the faster acquisition of invariant features under small-batch regimes, offering a principled perspective on the role of SGD and its hyperparameters in shaping reliance on spurious correlations. This analysis contributes to a deeper theoretical understanding of the mechanisms underlying robustness and generalization in machine learning.