nagino04@interspeech_2004@ISCA

Total: 1

#1 Design of ready-made acoustic model library by two-dimensional visualization of acoustic space [PDF] [Copy] [Kimi]

Authors: Goshu Nagino ; Makoto Shozakai

This paper proposes the technique enabling a design of ready-made library composed of high performance and small size acoustic models utilizing the method of visualizing multiple HMM acoustic models onto two-dimensional space (the COSMOS method: aCOustic Space Map Of Sound), and providing one of these models without overburdening users. The acoustic space (as expressed in multi-dimensional future parameters) is partitioned into zones on twodimensional space, allowing for the creation of highly precise acoustic models through the generation of acoustic models for respective zones of the acoustic space. A set of these acoustic models is called an acoustic model library. In an experiment of this paper, a plotted map (called the COSMOS map) featuring a total of 145 male speakers speaking in various styles was generated utilizing the COSMOS method. Through the COSMOS map, the distribution of each speaking styles and the relationship between the positioning of the speaker on the COSMOS map and the speech-recognition performance were analyzed, thereby demonstrating the effectiveness of the COSMOS method in the analysis of acoustic space. The COSMOS map was then partitioned into concentric acoustic space zones to produce acoustic models representing each acoustic space zones. By selecting the acoustic model providing maximum likelihood score effectively using voice samples consisting of 5 words, the acoustic model, even if expressed in single Gaussian distribution, showed high performance comparable to speaker-independent acoustic model (called SI-model) expressed in 16 mixture Gaussian distributions. Furthermore, the acoustic model showed performance higher than SI-model adapted with voice samples of 30 words by the MLLR [2] method.