Skip to yearly menu bar Skip to main content


Competition

NeurIPS 2024 Competition Proposal: URGENT Challenge

Wangyou Zhang · Robin Scheibler · Kohei Saijo · Samuele Cornell · Chenda Li · Zhaoheng Ni · Anurag Kumar · Marvin Sach · Wei Wang · Shinji Watanabe · Tim Fingscheidt · Yanmin Qian

[ ]
Sun 15 Dec 8:15 a.m. PST — 5:30 p.m. PST

Abstract:

Speech enhancement (SE) is the task of improving the quality of the desired speech while suppressing other interference signals.Tremendous progress has been achieved in the past decade in deep learning-based SE approaches.However, existing SE studies are often limited in one or multiple aspects of the following: coverage of SE sub-tasks, diversity and amount of data (especially real-world evaluation data), and diversity of evaluation metrics.As the first step to fill this gap, we establish a novel SE challenge, called URGENT, to promote research towards universal SE.It concentrates on the universality, robustness, and generalizability of SE approaches.In the challenge, we extend the conventionally narrow SE definition to cover different sub-tasks, thus allowing the exploration of the limits of current SE models.We start with four SE sub-tasks, including denoising, dereverberation, bandwidth extension, and declipping.Note that handling the above sub-tasks within a single SE model has been challenging and underexplored in the SE literature due to the distinct data formats in different tasks.As a result, most existing SE approaches are only designed for a specific subtask.To address this issue, we propose a technically novel framework to unify all these sub-tasks in a single model, which is compatible to most existing SE approaches.Several state-of-the-art baselines with different popular architectures have been provided for this challenge, including TF-GridNet, BSRNN, and Conv-TasNet.We also take care of the data diversity and amount by collecting abundant public speech and noise data from different domains.This allows for the construction of diverse training and evaluation data.Additional real recordings are further used for evaluating robustness and generalizability.Different from existing SE challenges, we adopt a wide range of evaluation metrics to provide comprehensive insights into the true capability of both generative and discriminative SE approaches.We expect this challenge would not only provide valuable insights into the current status of SE research, but also attract more research towards building universal SE models with strong robustness and good generalizability.

Live content is unavailable. Log in and register to view live content