Skip to yearly menu bar Skip to main content

Workshop: NeurIPS 2023 Workshop on Machine Learning for Creativity and Design

SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and Exploration

Stephen Brade · · Bryan Wang · Mauricio Sousa · Tovi Grossman · Gregory Lee Newsome

[ ]
[ Poster
Sat 16 Dec 1:30 p.m. PST — 2:30 p.m. PST


Synthesizers are powerful tools that allow musicians to create dynamic and original sounds. Existing commercial interfaces for synthesizers typically require musicians to interact with complex low-level parameters or to manage large libraries of premade sounds. To address these challenges, we implement SynthScribe --- a fullstack system that uses multimodal deep learning to let users express their intentions at a much higher level. We implement features which address a number of difficulties, namely 1) searching through existing sounds, 2) creating completely new sounds, 3) making meaningful modifications to a given sound. This is achieved with three main features: a multimodal search engine for a large library of synthesizer sounds; a user centered genetic algorithm by which completely new sounds can be created and selected given the user's preferences; a sound editing support feature which highlights and gives examples for key control parameters with respect to a text or audio based query. The combination of these features creates a novel workflow for musicians exemplifying the usefulness of systems developed with a foundation of multimodal deep learning.

Chat is not available.