Video Presentation
in
Session: Creative AI Videos
Re·col·lec·tions. Sharing sonic memories through interactive machine learning and neural audio synthesis models.
Gabriel Vigliensoni · Rebecca Fiebrink
“Re·col·lec·tions” is a sound and music exploration wherein we engage with a neural audio synthesis model in real-time through gestural interaction and interactive machine learning. The neural audio synthesis model in the work was trained on part of the sound archive from the Museo de la Memoria y los Derechos Humanos based in Santiago de Chile.
In “Re·col·lec·tions,” we mapped the human-performance space to the high- dimensional, computer-generated latent space of a neural audio model by utilizing a regressive model learned from a set of demonstrative actions. By implementing this method in ideation, exploration, and sound and music performance we have observed its efficiency, flexibility, and immediacy of control over generative audio processes.
Emphasizing real-time control in neural audio synthesis systems is crucial for performers to introduce long-term temporal coherence often lacking in these systems. Even if a generative model produces audio signals with short-term temporal coherence, it can still generate longer structures when appropriate control is applied during generation.
The technologies used in “Re•col•lec•tions” include MediaPipe Face Mesh (Kartynnik et al., 2019) for real-time face landmarks acquisition; RAVE (Caillon and Esling, 2021) for sound modeling and synthesis; Wekinator (Fiebrink, Trueman, Cook, 2009) and FluCoMa (Tremblay, Roma, Green 2021) for mapping human-performance space to computer-latent space; along with an interactive machine learning approach to steer latent audio models (Vigliensoni and Fiebrink, 2023).