Skip to yearly menu bar Skip to main content

Workshop: Machine Learning for Audio

Data is Overrated: Perceptual Metrics Can Lead Learning in the Absence of Training Data

Tashi Namgyal · Alexander Hepburn · Raul Santos-Rodriguez · Valero Laparra · Jesús Malo


Perceptual metrics are traditionally used to evaluate the quality of natural signals, such as images and audio. They are designed to mimic the perceptual behaviour of human observers and usually reflect structures found in natural signals. This motivates their use as loss functions for training generative models such that models will learn to capture the structure held in the metric. We take this idea to the extreme in the audio domain by training a compressive autoencoder to reconstruct uniform noise, in lieu of natural data. We show that training with perceptual losses improves the reconstruction of spectrograms and re-synthesized audio at test time over models trained with a standard Euclidean loss. This demonstrates better generalisation to unseen natural signals when using perceptual metrics.

Chat is not available.