fontaine21@interspeech_2021@ISCA

Total: 1

#1 Alpha-Stable Autoregressive Fast Multichannel Nonnegative Matrix Factorization for Joint Speech Enhancement and Dereverberation [PDF] [Copy] [Kimi1]

Authors: Mathieu Fontaine ; Kouhei Sekiguchi ; Aditya Arie Nugraha ; Yoshiaki Bando ; Kazuyoshi Yoshii

This paper proposes α-stable autoregressive fast multichannel nonnegative matrix factorization (α-AR-FastMNMF), a robust joint blind speech enhancement and dereverberation method for improved automatic speech recognition in a realistic adverse environment. The state-of-the-art versatile blind source separation method called FastMNMF that assumes the short-time Fourier transform (STFT) coefficients of a direct sound to follow a circular complex Gaussian distribution with jointly-diagonalizable full-rank spatial covariance matrices was extended to AR-FastMNMF with an autoregressive reverberation model. Instead of the light-tailed Gaussian distribution, we use the heavy-tailed α-stable distribution, which also has the reproductive property useful for the additive source modeling, to better deal with the large dynamic range of the direct sound. The experimental results demonstrate that the proposed α-AR-FastMNMF works well as a front-end of an automatic speech recognition system. It outperforms α-AR-ILRMA, which is a special case of α-AR-FastMNMF, and their Gaussian counterparts, i.e., AR-FastMNMF and AR-ILRMA, in terms of the speech signal quality metrics and word error rate.