HAC-Net: Learning Natural Units from Acoustic Change
Abstract
Animals structure their vocalizations around acoustic change points: boundaries where one element transitions to another or context shifts. These transitions reflect underlying production mechanisms and guide receiver perception. Yet most bioacoustics analyses still rely on predefined categories, energy-based rules, or generic audio codecs that ignore these natural boundaries. We propose HAC-Net, a method that discovers acoustic units by learning where patterns change in continuous recordings. The model reconstructs audio from boundaries it identifies, forcing it to place cuts at genuine transitions. It works hierarchically, finding both fine elements and larger structures. We expect this method to yield biologically grounded segmentation that supports discovery of meaningful variation and provides units suitable for sequence modeling. The resulting units will enable large-scale comparative studies across species without expert annotations, providing a consistent foundation for analyzing compositional structure, temporal organization, and downstream ecological applications.