DOSE: Diffusion Dropout with Adaptive Prior for Speech Enhancement

Wenxin Tai · Yue Lei · Fan Zhou · Goce Trajcevski · Ting Zhong

Great Hall & Hall B1+B2 (level 1) #335
[ ] [ Project Page ]
Tue 12 Dec 3:15 p.m. PST — 5:15 p.m. PST


Speech enhancement (SE) aims to improve the intelligibility and quality of speech in the presence of non-stationary additive noise. Deterministic deep learning models have traditionally been used for SE, but recent studies have shown that generative approaches, such as denoising diffusion probabilistic models (DDPMs), can also be effective. However, incorporating condition information into DDPMs for SE remains a challenge. We propose a model-agnostic method called DOSE that employs two efficient condition-augmentation techniques to address this challenge, based on two key insights: (1) We force the model to prioritize the condition factor when generating samples by training it with dropout operation; (2) We inject the condition information into the sampling process by providing an informative adaptive prior. Experiments demonstrate that our approach yields substantial improvements in high-quality and stable speech generation, consistency with the condition factor, and inference efficiency. Codes are publicly available at

Chat is not available.