An AI-Assisted Labeling Tool for Cataloging High-Resolution Images of Galaxies
Gustavo Perez · Sean Linden · Timothy McQuaid · Matteo Messa · Daniela Calzetti · Subhransu Maji

The Hubble Space Telescope (HST), the recently launched James Web Space Telescope (JWST), and many earth-based observatories collect data allowing astronomers to answer fundamental questions about the Universe. In this work we focus on an ecosystem of AI tools for cataloging bright sources within galaxies, and use them to analyze young star clusters -- groups of stars held together by their gravitational fields. Their age, mass among other properties provide insights into the process of star formation and the birth and evolution of galaxies. Significant domain expertise and resources are required to discriminate star clusters among tens of thousands of sources that may be extracted for each galaxy. To accelerate this step we propose: 1) a web-based annotation tool to label and visualize high-resolution astronomy data, encouraging efficient labeling and consensus building; and 2) techniques to reduce the annotation cost by leveraging recent advances in unsupervised representation learning on images. We present case studies where we work with astronomy researchers to validate the annotation tool and find that the proposed tools can reduce the annotation effort by 3$\times$ on existing HST catalogs, while facilitating accelerated analysis of new data.

#### Author Information

##### Gustavo Perez (University of Massachusetts Amherst)

I am a Ph.D. student and a graduate research assistant in the College of Information and Computer Sciences at the University of Massachusetts Amherst under the supervision of Subhransu Maji in the Computer Vision Lab. Previously, I spent three years as a graduate research assistant under the supervision of Pablo Arbeláez in the Biomedical Computer Vision Group at the Universidad de Los Andes, where I got my Master's degree in Biomedical Engineering. I got my Bachelor's degree in Electronics Engineering from Universidad del Norte in Barranquilla, Colombia. My research interests include the design of robust and data-efficient solutions to allow better science using computer vision and AI. In particular, I have experience working in medical imaging, biology, chemistry, and astronomy domains in several computer vision tasks, including image-level classification, object detection, and semantic segmentation. Also, I'm interested in transfer learning, semi and self-supervised learning, and active learning.