Skip to yearly menu bar Skip to main content

Invited talk
Workshop: Machine Learning for Audio

Computer Audition Disrupted 2.0: The Foundation Models Era

Bjoern Schuller


Computer Audition is changing. Since the advent of Large Audio, Language, and Multimodal Models, or generally Foundation Models, a new age has begun. Emergence of abilities in such large models by zero- or few-shot learning render it partially unnecessary to collect task-specific data and train an according model. After the last major disruption – learning representations and model architectures directly from data – this can be judged as the second major disruption in a field that once was coined by highly specialized features, approaches, and datasets shifting towards being absorbed by sheer size of models and data used for their training. In this talk, I will first argue that Computer Audition will be massively influenced by this “plate displacement” in Artificial Intelligence as a whole. I will then move towards “informed tea-leaf reading” how present and tomorrow’s Computer Audition will change in more detail. This includes prompt optimisation, fine-tuning, or synergistic combination of different foundation models and traditional approaches. Finally, I will turn towards dangers to this new glittery era – among many, the “nightshades” of audio may soon start to poison audio data. A new time has begun – it will empower Computer Audition at a whole new level while challenging us in whole new ways – let’s get ready

Chat is not available.