From Simulations to Surveys: Domain Adaptation for Galaxy Observations
Kaley Brauer · Aditya Prasad Dash · Meet Vyas · Ahmed Salim
Abstract
Large photometric surveys will image billions of galaxies, but information about their physical properties like morphology, stellar mass, and star formation rates remain scarce and biased. Simulations provide galaxy images with ground-truth physical labels, but domain shifts in PSF, noise, backgrounds, selection, and label priors degrade transfer to real surveys. We present a preliminary domain adaptation pipeline that trains on simulated TNG50 galaxies and evaluates on real DESI galaxies with morphology labels (elliptical/spiral/irregular). A compact CNN is supervised on source and aligned to unlabeled target features using one of three losses on L2-normalized embeddings: an energy/MMD-style loss, Sinkhorn (entropic OT), or adversarial (DANN). With fixed strength ($\lambda{=}0.1$), MMD delivers the best final target improvement, an increase of +13.4\% in accuracy compared to no adaption. All methods peak early, motivating early stopping and balancing $\lambda$ strength. Domain-gap diagnostics correlate only weakly with accuracy, highlighting that distribution matching alone is insufficient. Results suggest class-aware alignment and label-prior constraints as next steps, and the same recipe can extend beyond morphology to stellar mass, star-formation rate, and photometric redshift.
Chat is not available.
Successful Page Load