Self-Supervised vs Supervised Representation Learning for Fin Whale Vocalization Detection
Adam Chareyre · Haodong Zhang · Shuwen Ge · Randall Balestriero · HervĂ© GLOTIN · PARIS SEBASTIEN
Abstract
Fin whales produce low-frequency vocalizations critical for monitoring but are often masked by anthropogenic noise. While supervised detectors perform well, they require costly labels and degrade under noise or data scarcity. We present the first application of self-supervised learning (SSL) to fin-whale detection, combining contrastive predictive coding with an amplitude-aware encoder. Across datasets we collected in an arctic fjord in Norway, and in the Mediterranean Sea, SSL models outperform supervised Transformers Encoder in low-label (respectively 88.5% and 68.6% f1-score for 0.1% of the training set size) and low SNR regimes (respectively 87.4% and 81.3% f1-score for SNR $\le -5$ with the training set) and transfer effectively across regions. Embedding visualizations further show robust class separability. These results highlight SSL as a scalable approach for passive acoustic monitoring, reducing annotation needs, and paving the way for scalable, label-efficient acoustic monitoring across diverse marine habitats.
Video
Chat is not available.
Successful Page Load