Talk
in
Workshop: End-to-end Learning for Speech and Audio Processing

Andrew Maas: Lexicon-free conversational speech recognition by reasoning entirely at the character level

2016 Talk
in
Workshop: End-to-end Learning for Speech and Audio Processing

Abstract

I will present an approach to speech recognition that uses only a neural network to map acoustic input to characters, a character-level language model, and a beam search decoding procedure. Our approach builds on the connectionist temporal classification (CTC) speech recognition work of Graves & Jaitly, but reasons entirely at the character level. We demonstrate our approach using the Switchboard telephone conversation transcription task and show reasoning at the character level enables natural handling of out of vocabulary words and partial word fragments. Finally, we analyze qualitative differences between the transcripts and alignments of our system compared to those of standard HMM-based recognizers.

Chat is not available.