SING: Symbol-to-Instrument Neural Generator
Alexandre Defossez · Neil Zeghidour · Nicolas Usunier · Leon Bottou · Francis Bach

Tue Dec 4th 05:00 -- 07:00 PM @ Room 210 #90

Recent progress in deep learning for audio synthesis opens the way to models that directly produce the waveform, shifting away from the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation. Despite their successes, current state-of-the-art neural audio synthesizers such as WaveNet and SampleRNN suffer from prohibitive training and inference times because they are based on autoregressive models that generate audio samples one at a time at a rate of 16kHz. In this work, we study the more computationally efficient alternative of generating the waveform frame-by-frame with large strides. We present a lightweight neural audio synthesizer for the original task of generating musical notes given desired instrument, pitch and velocity. Our model is trained end-to-end to generate notes from nearly 1000 instruments with a single decoder, thanks to a new loss function that minimizes the distances between the log spectrograms of the generated and target waveforms. On the generalization task of synthesizing notes for pairs of pitch and instrument not seen during training, SING produces audio with significantly improved perceptual quality compared to a state-of-the-art autoencoder based on WaveNet as measured by a Mean Opinion Score (MOS), and is about 32 times faster for training and 2, 500 times faster for inference.

Author Information

Alexandre Defossez (Facebook)
Neil Zeghidour (Facebook A.I. Research / Ecole Normale Supérieure)
Nicolas Usunier (Facebook AI Research)
Leon Bottou (Facebook AI Research)

Léon Bottou received a Diplôme from l'Ecole Polytechnique, Paris in 1987, a Magistère en Mathématiques Fondamentales et Appliquées et Informatiques from Ecole Normale Supérieure, Paris in 1988, and a PhD in Computer Science from Université de Paris-Sud in 1991. He joined AT&T Bell Labs from 1991 to 1992 and AT&T Labs from 1995 to 2002. Between 1992 and 1995 he was chairman of Neuristique in Paris, a small company pioneering machine learning for data mining applications. He has been with NEC Labs America in Princeton since 2002. Léon's primary research interest is machine learning. His contributions to this field address theory, algorithms and large scale applications. Léon's secondary research interest is data compression and coding. His best known contribution in this field is the DjVu document compression technology ( Léon published over 70 papers and is serving on the boards of JMLR and IEEE TPAMI. He also serves on the scientific advisory board of Kxen Inc .

Francis Bach (INRIA - Ecole Normale Superieure)

More from the Same Authors