rezackova21@interspeech_2021@ISCA

Total: 1

#1 T5G2P: Using Text-to-Text Transfer Transformer for Grapheme-to-Phoneme Conversion [PDF] [Copy] [Kimi1]

Authors: Markéta Řezáčková ; Jan Švec ; Daniel Tihelka

Despite the increasing popularity of end-to-end text-to-speech (TTS) systems, the correct grapheme-to-phoneme (G2P) module is still a crucial part of those relying on a phonetic input. In this paper, we, therefore, introduce a T5G2P model, a Text-to-Text Transfer Transformer (T5) neural network model which is able to convert an input text sentence into a phoneme sequence with a high accuracy. The evaluation of our trained T5 model is carried out on English and Czech, since there are different specific properties of G2P, including homograph disambiguation, cross-word assimilation and irregular pronunciation of loanwords. The paper also contains an analysis of a homographs issue in English and offers another approach to Czech phonetic transcription using the detection of pronunciation exceptions.