Skip to yearly menu bar Skip to main content

Affinity Workshop: Women in Machine Learning

A Simple Phoneme-based Error Simulator for ASR Error Correction

Mohita Chowdhury · Oliver Gardiner · Yishu Miao


Despite the recent advances brought by deep neural networks, the real-world applications of Automatic Speech Recognition (ASR) inevitably suffer from various errors mostly caused by incorrectly captured phonetic features. This is of particular consequence in our work which involves the transcription of real patient clinical conversations. In this work, we aim to fix noisy off-the-shelf ASR transcriptions in low-data settings by building a simple phoneme-based error simulator that can generate large amounts of training data for post-editing ASR error correction systems. To demonstrate the efficacy of our simulated errors, we conduct experiments with different error correction architectures – our own multi-task trained dual-decoder transformer model that performs both error detection and error correction and two state-of-the-art grammatical error correction models. All these models improved in performance (by 0.3 - 1.4% WER) when pretrained on our simulated errors. Also, increasing the amount of simulated data in pretraining, from 0 to 1x and 10x the size of Librispeech, improves performance in the error correction task, regardless of the model structure. We are currently working to develop more domain-specific data to further improve transcriptions in clinical settings.

Chat is not available.