Skip to yearly menu bar Skip to main content

Workshop: Machine Learning for Audio

Creative Text-to-Audio Generation via Synthesizer Programming

Nikhil Singh · Manuel Cherep · Jessica Shand


Sound designers have long harnessed the power of abstraction to distill and highlight the semantic essence of real-world auditory phenomena, akin to how simple sketches can vividly convey visual concepts. However, current neural audio synthesis methods lean heavily towards capturing acoustic realism. We introduce an open-source novel method centered on meaningful abstraction. Our approach takes a text prompt and iteratively refines the parameters of a virtual modular synthesizer to produce sounds with high semantic alignment, as predicted by a pretrained audio-language model. Our results underscore the distinctiveness of our method compared with both real recordings and state-of-the-art generative models.

Chat is not available.