Skip to yearly menu bar Skip to main content

Workshop: I Can’t Believe It’s Not Better: Understanding Deep Learning Through Empirical Falsification

On The Diversity of ASR Hypotheses In Spoken Language Understanding

Surya Kant Sahu · Swaraj Dalmia


In Conversational AI, an Automatic Speech Recognition (ASR) system is used to transcribe the user's speech, and the output of the ASR is passed as an input to a Spoken Language Understanding (SLU) system, which outputs semantic objects (such as intent, slot-act pairs, etc.). Recent work, including the state-of-the-art methods in SLU utilize either Word lattices or N-Best Hypotheses from the ASR. The intuition given for using N-Best instead of 1-Best is that the hypotheses provide extra information due to errors in the transcriptions of the ASR system, i.e., the performance gain is attributed to the word-error-rate (WER) of the ASR. We empirically show that the gain in using N-Best hypotheses is loosely related to WER but related to the diversity of hypotheses.

Chat is not available.