Scaling Frog Monitoring with FrogID: A Robust Classification Pipeline for Citizen Science Using Bioacoustic Foundation Models
Abstract
Amidst global biodiversity declines, audio-based citizen science projects offer significant potential for biodiversity monitoring, but the need for manual validation limits scalability. The FrogID project has gathered over 1.3 million frog records from over 800,000 audio submissions, advancing amphibian research and conservation in Australia, yet manual species identification remains time-consuming, creating backlogs, delaying conservation action, and reducing user engagement. We present a frog species identification pipeline that combines unsupervised source separation with an audio-language foundation model to refine coarse annotations, followed by transfer learning from cross-taxa embeddings with a hybrid classifier. The method achieves strong per-species performance even on non-quality audio, enabling scalable frog monitoring.