NeurIPS Competition Workshop for URGENT 2024 Challenge

Competition

Workshop for URGENT 2024 Challenge

Wangyou Zhang · Robin Scheibler · Kohei Saijo · Samuele Cornell · Chenda Li · Zhaoheng Ni · Anurag Kumar · Marvin Sach · Wei Wang · Yihui Fu · Shinji Watanabe · Tim Fingscheidt · Yanmin Qian

West Meeting Room 215, 216

[ Abstract ] [ Project Page ]

[ OpenReview]

Sat 14 Dec 1:30 p.m. PST — 4:10 p.m. PST

Abstract:

Speech enhancement (SE) is the task of improving the quality of the desired speech while suppressing other interference signals.Tremendous progress has been achieved in the past decade in deep learning-based SE approaches.However, existing SE studies are often limited in one or multiple aspects of the following: coverage of SE sub-tasks, diversity and amount of data (especially real-world evaluation data), and diversity of evaluation metrics.As the first step to fill this gap, we establish a novel SE challenge, called URGENT, to promote research towards universal SE.It concentrates on the universality, robustness, and generalizability of SE approaches.In the challenge, we extend the conventionally narrow SE definition to cover different sub-tasks, thus allowing the exploration of the limits of current SE models.We start with four SE sub-tasks, including denoising, dereverberation, bandwidth extension, and declipping.Note that handling the above sub-tasks within a single SE model has been challenging and underexplored in the SE literature due to the distinct data formats in different tasks.As a result, most existing SE approaches are only designed for a specific subtask.To address this issue, we propose a technically novel framework to unify all these sub-tasks in a single model, which is compatible to most existing SE approaches.Several state-of-the-art baselines with different popular architectures have been provided for this challenge, including TF-GridNet, BSRNN, and Conv-TasNet.We also take care of the data diversity and amount by collecting abundant public speech and noise data from different domains.This allows for the construction of diverse training and evaluation data.Additional real recordings are further used for evaluating robustness and generalizability.Different from existing SE challenges, we adopt a wide range of evaluation metrics to provide comprehensive insights into the true capability of both generative and discriminative SE approaches.We expect this challenge would not only provide valuable insights into the current status of SE research, but also attract more research towards building universal SE models with strong robustness and good generalizability.

Chat is not available.

Schedule

Sat 1:30 p.m. - 1:45 p.m.	Opening Remarks ( Presentation ) > SlidesLive Video	Samuele Cornell 🔗
Sat 1:45 p.m. - 2:00 p.m.	Presentation from team 'Bytedance-SMT-Audio' ( Oral Presentation ) > SlidesLive Video	Xiaohuai Le 🔗
Sat 2:00 p.m. - 2:15 p.m.	Presentation from team 'NJU-AALab' ( Oral Presentation ) > SlidesLive Video	Xiaobin Rong 🔗
Sat 2:15 p.m. - 2:30 p.m.	Presentaiton from team 'NAVS' ( Oral Presentation ) > SlidesLive Video	Rong Chao 🔗
Sat 2:30 p.m. - 3:30 p.m.	Invited talk: The Journey Towards Universal Perception: Experiments in Unsupervised, Multi-Task, Multi-Domain, Multi-Modal, and Multi-Channel Learning ( Oral Presentation ) > SlidesLive Video	John R. Hershey 🔗
Sat 3:30 p.m. - 3:45 p.m.	Presentation from team 'ALPACA' ( Oral Presentation ) > SlidesLive Video	Seungu Han 🔗
Sat 3:45 p.m. - 4:00 p.m.	Presentation from team 'Hamburgers' ( Oral Presentation ) > SlidesLive Video	Julius Richter 🔗
Sat 4:00 p.m. - 4:10 p.m.	Closing Remarks ( Closing Remarks ) > SlidesLive Video	🔗