Towards Mitigating Systematics in Large-Scale Surveys via Few-Shot Optimal Transport-Based Feature Alignment
Abstract
Systematics contaminate observables, leading to distribution shifts relative to theoretically simulated signals—posing a major challenge for using pre-trained models to label such observables. Since systematics are often poorly understood and difficult to model, removing them directly and entirely may not be feasible. In this work, we propose a novel method to align features between in-distribution (ID) and out-of-distribution (OOD) samples using a pre-trained model on ID data. We first experimentally validate the method on the MNIST dataset using possible alignment losses, including mean squared error (MSE) and optimal transport, and subsequently apply it to large-scale maps of neutral hydrogen (HI). Our results show that optimal transport is particularly effective at aligning OOD features when parity between ID and OOD samples is unknown, even with limited data—mimicking real-world conditions in extracting information from large-scale surveys.