hashimoto14@interspeech_2014@ISCA

Total: 1

#1 Speech recognition based on Itakura-Saito divergence and dynamics/sparseness constraints from mixed sound of speech and music by non-negative matrix factorization [PDF] [Copy] [Kimi1]

Authors: Naoaki Hashimoto ; Shoichi Nakano ; Kazumasa Yamamoto ; Seiichi Nakagawa

We considered a speech recognition method for mixed sound, which is composed of both speech and music, that only removes music based on non-negative matrix factorization (NMF). We used Itakura-Saito divergence instead of Kullback-Leibler divergence to compare the cost function, and the dynamics and sparseness constraints of a weight matrix to improve speech recognition. For isolated word recognition using the matched condition model, we reduced the word error rate of 52.1% relative from the case that didn't remove music (on average, from 69.3% to 85.3%).