Audio-to-Audio Schrodinger Bridges
Kevin Shih · Zhifeng Kong · Weili Nie · Arash Vahdat · Sang-gil Lee · Joao Felipe Santos · Ante Jukić · Rafael Valle · Bryan Catanzaro
Abstract
Real-world audio is often degraded by numerous factors. This work presents an audio restoration model tailored for high-res (44.1kHz) music. Our model, Audio-to-Audio \Sch{} Bridges, is capable of both bandwidth extension (predicting high-frequency components) and inpainting (re-generating missing segments). Critically, it is end-to-end -- requiring no vocoder to predict waveform outputs, able to restore hour-long audio inputs, and trained on permissively licensed music data. Our model is capable of achieving state-of-the-art bandwidth extension and inpainting quality on several out-of-distribution music test sets.
Chat is not available.
Successful Page Load