kolossa12@interspeech_2012@ISCA

Total: 1

#1 Inventory-based audio-visual speech enhancement [PDF] [Copy] [Kimi1]

Authors: Dorothea Kolossa ; Robert Nickel ; Steffen Zeiler ; Rainer Martin

In this paper we propose to combine audio-visual speech recognition with inventory-based speech synthesis for speech enhancement. Unlike traditional filtering-based speech enhancement, inventory-based speech synthesis avoids the usual trade-off between noise reduction and consequential speech distortion. For this purpose, the processed speech signal is composed from a given speech inventory which contains snippets of speech from a targeted speaker. However, the combination of speech recognition and synthesis is susceptible to noise as recognition errors can lead to a suboptimal selection of speech segments. The search for fitting clean speech segments can be significantly improved when audio-visual information is utilized by means of a coupled HMM recognizer and an uncertainty decoding framework. First results using this novel system are reported in terms of several instrumental measures for three types of noise.