isogai10@interspeech_2010@ISCA

Total: 1

#1 Speech database reduction method for corpus-based TTS system [PDF] [Copy] [Kimi1]

Authors: Mitsuaki Isogai ; Hideyuki Mizuno

We propose a new speech database reduction method that can create efficient speech databases for concatenation-type corpus-based TTS systems. Our aim is to create small speech databases that can yield the highest quality speech output possible. The main points of proposed method are as follows; (1) It has a 2-stage algorithm to reduce speech database size. (2) Consideration of the real speech elements needed allows us to select the most suitable subset of a full-size database; this yields scalable downsized speech databases. A listening test shows that proposed method can reduced a database from 13 hours to 10 hours with no degradation in output quality. Furthermore, synthesized speech using database sizes of 8 and 6 hours keeps relatively high MOS of more than 3.5; 95% of MOS using full size database.