Acoustic Degradation Reweights Cortical and ASR Processing: A Brain-Model Alignment Study
Francis Pingfan Chien · Chia-Chun Hsu · Po-Jang Hsieh · Yu Tsao
Abstract
We tested whether acoustic degradation changes how a modern ASR represents speech and whether those changes explain human brain and behavior. Twenty-five participants listened to clean and noisy ($-3$ dB SNR) Mandarin sentences during fMRI while we extracted layer-wise embeddings from Whisper-Tiny. We computed brain scores normalized to each ROI's noise ceiling. Behavioral assessments, including intelligibility, perceived quality, and comprehension, declined under noise. Under clean speech, alignment emphasized frontal predictive processing, with encoder layers 3 and 4 peaking in the right inferior frontal gyrus (IFG) and decoder layer 2 peaking in the left middle frontal gyrus (MFG). Under noisy speech, alignment shifted toward early acoustic and evaluative regions, with encoder layer 1 peaking in the right Heschl’s gyrus and encoder layer 4 peaking in the right IFG pars orbitalis (IFGorb), and decoder peaks were weaker and more diffuse. Condition contrasts showed higher alignment for clean speech in the right IFG (encoder layers 3 and 4) and the left MFG (decoder layer 2). These findings demonstrate a processing account, a behavioral link, and a compact layer-to-region map across listening conditions.
Chat is not available.
Successful Page Load