The Performance Cost of Representational Misalignment
Abstract
Several recent results suggest that brain-like computations emerge in Deep Neural Networks (DNNs) trained on naturalistic stimuli, leading to the hypothesis that shared computations between DNNs and brains arise because these representations are necessary for optimal performance. However, existing studies primarily demonstrate correlations between alignment and performance rather than establishing causality. We address this gap by proposing a representational perturbation framework that actively promotes or suppresses representational alignment during training with reference representations while maintaining task optimization. This allows us to test whether representational alignment is necessary for optimal performance or merely coincidental. We train over 60 large-scale vision models under varying alignment constraints, constructing Pareto-optimal curves that quantify the trade-off between representational alignment and task performance. Our results consistently show that models trained to minimize alignment with oracle theoretical models, pretrained networks, or brain responses achieve worse task performance than those trained to maximize alignment, providing the first causal evidence that representational alignment is functionally important rather than epiphenomenal.