Skip to yearly menu bar Skip to main content


Poster

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

Wei-Ning Hsu · Yu Zhang · James Glass

Pacific Ballroom #115

Keywords: [ Speech Recognition ] [ Audio and Speech Processing ] [ Latent Variable Models ] [ Representation Learning ] [ Generative Models ] [ Deep Autoencoders ] [ Unsupervised Learning ]


Abstract:

We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision. Specifically, we exploit the multi-scale nature of information in sequential data by formulating it explicitly within a factorized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables. The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker verification and reduce the word error rate by as much as 35% in mismatched train/test scenarios for automatic speech recognition tasks.

Live content is unavailable. Log in and register to view live content