Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation

#1 Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation [PDF] [Copy] [Kimi] [REL]

Authors: Ryu Takeda, Shun'ichi Yamamoto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Robot audition systems require capabilities for sound source separation and the recognition of separated sounds, since we hear a mixture of sounds in our daily lives, especially mixed of speech. We report a robot audition system with a pair of omni-directional microphones embedded in a humanoid that recognizes two simultaneous talkers. It first separates the sound sources by Independent Component Analysis (ICA) with the single-input multiple-output (SIMO) model. Then, spectral distortion in the separated sounds is then estimated to generate missing feature masks. Finally, the separated sounds are recognized by missing-feature theory (MFT) for Automatic Speech Recognition (ASR). The novel aspects of our system involve estimates of spectral distortion in the temporal-frequency domain in terms of feature vectors and based on estimates error in SIMO-ICA signals. The resulting system outperformed the baseline robot audition system by 7%.

Subject: INTERSPEECH.2006 - Speech Recognition

takeda06@interspeech_2006@ISCA

#1 Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation [PDF] [Copy] [Kimi] [REL]