Skip to yearly menu bar Skip to main content


Poster

CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing

Yen-Ju Lu · Jing Liu · Thomas Thebaud · Laureano Moro-Velazquez · Ariya Rastrow · Najim Dehak · Jesus Villalba

[ ]
Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

We introduce Condition-Aware Self-Supervised Learning Representation (CA-SSLR), a generalist conditioning model broadly applicable to various speech-processing tasks. Compared to standard fine-tuning methods that optimize for downstream models, CA-SSLR integrates language and speaker embeddings from earlier layers, making the SSL model aware of the current language and speaker context.This approach reduces the reliance on the input audio features while preserving the integrity of the base SSLR. CA-SSLR improves the model’s capabilities and demonstrates its generality on unseen tasks with minimal task-specific tuning. Our method employs attention mechanisms and linear modulation to dynamically adjust scaling and biasing, tailoring the model’s response at each time step. The initialization techniques allow the conditioning module to perform identity transformations, ensuring that the existing model behavior is maintained when incorporating new conditions. Our experiments show that CA-SSLR reduces the number of trainable parameters, mitigates overfitting, and excels in under-resourced and unseen tasks. Specifically, CA-SSLR achieves a 10\% relative reduction in LID errors, a 37\% improvement in ASR CER on the ML-SUPERB benchmark, and a 27\% decrease in SV EER on VoxCeleb-1.

Live content is unavailable. Log in and register to view live content