Poster
in
Workshop: Causal Representation Learning

DISK: Domain Inference for Discovering Spurious Correlation with KL-Divergence

Yujin Han ⋅ Difan Zou

Keywords: spurious correlation subpopulation shift Domain generalization

Project Page [ OpenReview]

Abstract

Existing methods utilize domain information to address the subpopulation shift issue and enhance model generalization. However, the availability of domain information is not always guaranteed. In response to this challenge, we introduce a novel end-to-end method called DISK. DISK discovers the spurious correlations present in the training and validation sets through KL-divergence and assigns spurious labels (which are also the domain labels) to classify instances based on spurious features. By combining spurious labels $y_s$ with true labels $y$, DISK effectively partitions the data into different groups with unique data distributions $\mathbb{P}(\mathbf{x}|y,y_s)$. The group partition inferred by DISK then can be seamlessly leveraged to design algorithms to further mitigate the subpopulation shift and improve generalization on test data. Unlike existing domain inference methods, such as ZIN and DISC, DISK reliably infers domains without requiring additional information. We extensively evaluated DISK on different datasets, considering scenarios where validation labels are either available or unavailable, demonstrating its effectiveness in domain inference and mitigating subpopulation shift. Furthermore, our results also suggest that for some complex data, the neural network-based DISK may have the potential to perform more reasonable domain inferences, which highlights the potential effective integration of DISK and human decisions when the (human-defined) domain information is available. Codes of DISK are available at [https://anonymous.4open.science/r/DISK-E23A/](https://anonymous.4open.science/r/DISK-E23A/).

Chat is not available.