Talk
in
Workshop: End-to-end Learning for Speech and Audio Processing

Tara Sainath: Multichannel Signal Processing with Deep Neural Networks for Automatic Speech Recognition

2016 Talk
in
Workshop: End-to-end Learning for Speech and Audio Processing

Abstract

Automatic Speech Recognition systems commonly separate speech enhancement, including localization, beamforming and postfiltering, from acoustic modeling. In this talk, we perform multichannel enhancement jointly with acoustic modeling in a deep neural network framework. Overall, we find that such multichannel neural networks give a relative word error rate improvement of more than 5% compared to a traditional beamforming-based multichannel ASR system and more than 10% compared to a single channel model.

Chat is not available.