Timezone: »

Listening to Sounds of Silence for Speech Denoising
Ruilin Xu · Rundi Wu · Yuko Ishiwaka · Carl Vondrick · Changxi Zheng

Thu Dec 10 09:00 AM -- 11:00 AM (PST) @ Poster Session 5 #1666

We introduce a deep learning model for speech denoising, a long-standing challenge in audio analysis arising in numerous applications. Our approach is based on a key observation about human speech: there is often a short pause between each sentence or word. In a recorded speech signal, those pauses introduce a series of time periods during which only noise is present. We leverage these incidental silent intervals to learn a model for automatic speech denoising given only mono-channel audio. Detected silent intervals over time expose not just pure noise but its time-varying features, allowing the model to learn noise dynamics and suppress it from the speech signal. Experiments on multiple datasets confirm the pivotal role of silent interval detection for speech denoising, and our method outperforms several state-of-the-art denoising methods, including those that accept only audio input (like ours) and those that denoise based on audiovisual input (and hence require more information). We also show that our method enjoys excellent generalization properties, such as denoising spoken languages not seen during training.

Author Information

Ruilin Xu (Columbia University)
Rundi Wu (Columbia University)
Yuko Ishiwaka (SoftBank Corp.)

She has got her Ph.D. by a multiagent system. She was an assistant at Hakodate National College of Technology and moved to Hokkaido University as an associate professor. She is working at SoftBank Corp. as a researcher. Her research interest is machine learning inspired by neuroscience to create a new emergent system. She is a project leader of computational neuroscience and cognitive system at SoftBank Corp.

Carl Vondrick (Columbia University)
Changxi Zheng (Columbia University)

More from the Same Authors