Timezone: »
One of the great successes of end-to-end learning strategies such as Connectionist Temporal Classification in automatic speech recognition is the ability to train very powerful models that map directly from features to characters or context independent phones. Traditional hybrid models, or even GMMs usually require context dependent states and a Hidden Markov Model in order to achieve good performance. By contrast, with CTC, it thus becomes possible to train a multi-lingual RNN that can directly predict phones in multiple languages (multi-task training), language independent articulatory features, and language universal phones, allowing for the recognition of speech in languages for which no acoustic training data is available.
Author Information
Florian Metze (Meta)
More from the Same Authors
-
2022 Poster: Masked Autoencoders that Listen »
Po-Yao Huang · Hu Xu · Juncheng Li · Alexei Baevski · Michael Auli · Wojciech Galuba · Florian Metze · Christoph Feichtenhofer -
2021 Poster: Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers »
Mandela Patrick · Dylan Campbell · Yuki Asano · Ishan Misra · Florian Metze · Christoph Feichtenhofer · Andrea Vedaldi · João Henriques -
2021 Oral: Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers »
Mandela Patrick · Dylan Campbell · Yuki Asano · Ishan Misra · Florian Metze · Christoph Feichtenhofer · Andrea Vedaldi · João Henriques -
2019 Poster: Adversarial Music: Real world Audio Adversary against Wake-word Detection System »
Juncheng Li · Shuhui Qu · Xinjian Li · Joseph Szurley · J. Zico Kolter · Florian Metze -
2019 Spotlight: Adversarial Music: Real world Audio Adversary against Wake-word Detection System »
Juncheng Li · Shuhui Qu · Xinjian Li · Joseph Szurley · J. Zico Kolter · Florian Metze -
2016 : Spotlights and Posters »
Florian Metze