This is the public, feature-limited version of the conference webpage. After Registration and login please visit the full version.

The Cone of Silence: Speech Separation by Localization

Teerapat Jenrungrot, Vivek Jayaram, Steve Seitz, Ira Kemelmacher-Shlizerman

Oral presentation: Orals & Spotlights Track 03: Language/Audio Applications
on 2020-12-07T18:30:00-08:00 - 2020-12-07T18:45:00-08:00
Poster Session 1 (more posters)
on 2020-12-07T21:00:00-08:00 - 2020-12-07T23:00:00-08:00
Abstract: Given a multi-microphone recording of an unknown number of speakers talking concurrently, we simultaneously localize the sources and separate the individual speakers. At the core of our method is a deep network, in the waveform domain, which isolates sources within an angular region $\theta \pm w/2$, given an angle of interest $\theta$ and angular window size $w$. By exponentially decreasing $w$, we can perform a binary search to localize and separate all sources in logarithmic time. Our algorithm also allows for an arbitrary number of potentially moving speakers at test time, including more speakers than seen during training. Experiments demonstrate state of the art performance for both source separation and source localization, particularly in high levels of background noise.

Preview Video and Chat

To see video, interact with the author and ask questions please use registration and login.