Skip to yearly menu bar Skip to main content


Poster

Correlated Bigram LSA for Unsupervised Language Model Adaptation

Yik-Cheung Tam · Tanja Schultz


Abstract:

We propose using correlated bigram LSA for unsupervised LM adaptation for automatic speech recognition. The model is trained using efficient variational EM and smoothed using the proposed fractional Kneser-Ney smoothing which handles fractional counts. Our approach can be scalable to large training corpora via bootstrapping of bigram LSA from unigram LSA. For LM adaptation, unigram and bigram LSA are integrated into the background N-gram LM via marginal adaptation and linear interpolation respectively. Experimental results show that applying unigram and bigram LSA together yields 6%--8% relative perplexity reduction and 0.6% absolute character error rates (CER) reduction compared to applying only unigram LSA on the Mandarin RT04 test set. Comparing with the unadapted baseline, our approach reduces the absolute CER by 1.2%.

Live content is unavailable. Log in and register to view live content