Skip to yearly menu bar Skip to main content

Invited talk
Workshop: Machine Learning for Audio

Multi-channel speech enhancement for moving sources

Shoko Araki

[ ]
Sat 16 Dec 3 p.m. PST — 3:20 p.m. PST


Speech enhancement technology has made remarkable progress in recent years. While many single channel methods have been proposed, and their performance has improved, multi-channel speech enhancement technology remains important due to its high performance in estimating and retaining sound source spatial information. Many multi-channel processing methods have been proposed so far for cases where the sound source and noise positions are fixed. However, for real-world applications, it is necessary to consider sound source movement and improve robustness to moving sources. In this presentation, I will introduce multi-channel audio enhancement technologies for moving sources. First, I will present an extension of mask-based neural beamforming, which is widely used as an ASR front-end, to moving sound sources. This extension is achieved by integrating model-based array signal processing and data-driven deep learning approaches. Then, I will discuss model-based, unsupervised multi-channel source separation and extraction approaches, e.g., independent component/vector analysis (ICA/IVA). For multi-channel processing, in addition to dealing with moving sources, it is also essential to devise techniques that limit the increase in computational complexity as the number of microphones increases. To address this issue, I will introduce a fast online IVA algorithm for tracking a single moving source that achieves optimal time complexity and operates significantly faster than conventional approaches.

Chat is not available.