AI for Animal Pain Assessment: a New Challenge for Bioacoustics
Abstract
We introduce CoViNLE, a coarse-to-fine architecture for audio-visual question answering that refines global and local cues in videos into natural language descriptions. We evaluate its effectiveness in assessing acute pain in canines, which is challenging due to the subtle behavioural and bioacoustic signals animals display. CoViNLE is tested against veterinary expert scores using the Glasgow Composite Pain Measure Index. The results reveal significant limitations: vision-only models overlook important behaviours, while audio-based models, such as fine-tuned Whisper and NatureLM-audio, identify vocal pain indicators but often produce unstable results and hallucinations. This highlights the need for more robust audio models and more diverse datasets for training bioacoustic large language models.