klimkov17@interspeech_2017@ISCA

Total: 1

#1 Phrase Break Prediction for Long-Form Reading TTS: Exploiting Text Structure Information [PDF] [Copy] [Kimi1]

Authors: Viacheslav Klimkov ; Adam Nadolski ; Alexis Moinet ; Bartosz Putrycz ; Roberto Barra-Chicote ; Thomas Merritt ; Thomas Drugman

Phrasing structure is one of the most important factors in increasing the naturalness of text-to-speech (TTS) systems, in particular for long-form reading. Most existing TTS systems are optimized for isolated short sentences, and completely discard the larger context or structure of the text. This paper presents how we have built phrasing models based on data extracted from audiobooks. We investigate how various types of textual features can improve phrase break prediction: part-of-speech (POS), guess POS (GPOS), dependency tree features and word embeddings. These features are fed into a bidirectional LSTM or a CART baseline. The resulting systems are compared using both objective and subjective evaluations. Using BiLSTM and word embeddings proves to be beneficial.