Timezone: »
Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech (TTS) systems. However, previous approaches require substantial annotated training data and additional efforts from language experts, making it difficult to extend high-quality neural TTS systems to out-of-domain daily conversations and countless languages worldwide. This paper tackles the polyphone disambiguation problem from a concise and novel perspective: we propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary (the existing prior information in the natural language). Specifically, we design a semantics-to-pronunciation attention (S2PA) module to match the semantic patterns between the input text sequence and the prior semantics in the dictionary and obtain the corresponding pronunciations; The S2PA module can be easily trained with the end-to-end TTS model without any annotated phoneme labels. Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy and improves the prosody modeling of TTS systems. Further extensive analyses demonstrate that each design in Dict-TTS is effective. The code is available at https://github.com/Zain-Jiang/Dict-TTS.
Author Information
Ziyue Jiang (Zhejiang University)
Zhe Su (Zhejiang University)
Zhou Zhao (Zhejiang University)
Qian Yang (Zhejiang University)
Yi Ren (Zhejiang University)
Jinglin Liu (Zhejiang University)
振辉 叶 (Zhejiang University)
More from the Same Authors
-
2022 Poster: GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech »
Rongjie Huang · Yi Ren · Jinglin Liu · Chenye Cui · Zhou Zhao -
2022 Poster: Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization »
Yang Zhao · Chen Zhang · Haifeng Huang · Haoyuan Li · Zhou Zhao -
2022 Poster: M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus »
Lichao Zhang · Ruiqi Li · Shoutong Wang · Liqun Deng · Jinglin Liu · Yi Ren · Jinzheng He · Rongjie Huang · Jieming Zhu · Xiao Chen · Zhou Zhao -
2022 : Recommendation for New Drugs with Limited Prescription Data »
Zhenbang Wu · Huaxiu Yao · Zhe Su · David Liebovitz · Lucas Glass · James Zou · Chelsea Finn · Jimeng Sun -
2023 Poster: Achieving Cross Modal Generalization with Multimodal Unified Representation »
Yan Xia · Hai Huang · Jieming Zhu · Zhou Zhao -
2023 Poster: Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective »
Pengfei Wei · Lingdong Kong · Xinghua Qu · Yi Ren · Zhiqiang Xu · Jing Jiang · Xiang Yin -
2023 Poster: Connecting Multi-modal Contrastive Representations »
Zehan Wang · Yang Zhao · Xize 成 · Haifeng Huang · Jiageng Liu · Aoxiong Yin · Li Tang · Linjun Li · Yongqi Wang · Ziang Zhang · Zhou Zhao -
2023 Poster: Uncovering and Quantifying Social Biases in Code Generation »
Yan Liu · Xiaokang Chen · Yan Gao · Zhe Su · Fengji Zhang · Daoguang Zan · Jian-Guang Lou · Pin-Yu Chen · Tsung-Yi Ho -
2023 Poster: Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks »
Haoyi Duan · Yan Xia · Zhou Mingze · Li Tang · Jieming Zhu · Zhou Zhao -
2022 Spotlight: Lightning Talks 4B-4 »
Ziyue Jiang · Zeeshan Khan · Yuxiang Yang · Chenze Shao · Yichong Leng · Zehao Yu · Wenguan Wang · Xian Liu · Zehua Chen · Yang Feng · Qianyi Wu · James Liang · C.V. Jawahar · Junjie Yang · Zhe Su · Songyou Peng · Yufei Xu · Junliang Guo · Michael Niemeyer · Hang Zhou · Zhou Zhao · Makarand Tapaswi · Dongfang Liu · Qian Yang · Torsten Sattler · Yuanqi Du · Haohe Liu · Jing Zhang · Andreas Geiger · Yi Ren · Long Lan · Jiawei Chen · Wayne Wu · Dahua Lin · Dacheng Tao · Xu Tan · Jinglin Liu · Ziwei Liu · 振辉 叶 · Danilo Mandic · Lei He · Xiangyang Li · Tao Qin · sheng zhao · Tie-Yan Liu -
2022 Spotlight: Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech »
Ziyue Jiang · Zhe Su · Zhou Zhao · Qian Yang · Yi Ren · Jinglin Liu · 振辉 叶 -
2022 Spotlight: GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech »
Rongjie Huang · Yi Ren · Jinglin Liu · Chenye Cui · Zhou Zhao -
2022 Spotlight: M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus »
Lichao Zhang · Ruiqi Li · Shoutong Wang · Liqun Deng · Jinglin Liu · Yi Ren · Jinzheng He · Rongjie Huang · Jieming Zhu · Xiao Chen · Zhou Zhao -
2022 Poster: Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models »
Zijian Zhang · Zhou Zhao · Zhijie Lin -
2021 Poster: PortaSpeech: Portable and High-Quality Generative Text-to-Speech »
Yi Ren · Jinglin Liu · Zhou Zhao -
2021 Poster: Generalizable Multi-linear Attention Network »
Tao Jin · Zhou Zhao -
2020 Poster: Counterfactual Contrastive Learning for Weakly-Supervised Vision-Language Grounding »
Zhu Zhang · Zhou Zhao · Zhijie Lin · jieming zhu · Xiuqiang He -
2019 Poster: FastSpeech: Fast, Robust and Controllable Text to Speech »
Yi Ren · Yangjun Ruan · Xu Tan · Tao Qin · Sheng Zhao · Zhou Zhao · Tie-Yan Liu -
2018 Poster: MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models »
Boyuan Pan · Yazheng Yang · Hao Li · Zhou Zhao · Yueting Zhuang · Deng Cai · Xiaofei He