Retrieval-Augmented Bioacoustics: Evidence-Guided Generation for Animal Communication
Abstract
Animal vocalizations carry important information about communication, context, and behavior, but most current AI approaches in bioacoustics focus on narrow tasks such as species classification or call detection. A gap remains in methods that can help researchers interpret and summarize acoustic data in a grounded and transparent way. This proposal introduces Retrieval-Augmented Bioacoustics (RAB), a framework that combines acoustic embeddings with retrieval from call libraries and generative modeling. Retrieval provides concrete evidence, while generation produces outputs such as annotation suggestions, monitoring summaries, cross-species communication hypotheses, and prototype call synthesis. Two design choices strengthen the framework: adapting the number of retrieved neighbors depending on signal quality, and citing retrieved calls directly in generated outputs to increase transparency. RAB offers a model-agnostic approach that can be applied on top of existing or future embedding models, with potential impact on both ethological research and conservation applications.