kim22i@interspeech_2022@ISCA

Total: 1

#1 iDeepMMSE: An improved deep learning approach to MMSE speech and noise power spectrum estimation for speech enhancement [PDF] [Copy] [Kimi1]

Authors: Minseung Kim ; Hyungchan Song ; Sein Cheong ; Jong Won Shin

Deep learning approaches have been successfully applied to single channel speech enhancement exhibiting significant performance improvement. Recently, approaches unifying deep learning techniques into a statistical speech enhancement framework were proposed, including Deep Xi and DeepMMSE in which a priori signal-to-noise ratios (SNRs) were estimated by deep neural networks (DNNs) and noise power spectral density (PSD) and spectral gain functions were computed with estimated parameters. In this paper, we propose an improved DeepMMSE (iDeepMMSE) which estimates the speech PSD and speech presence probability as well as the a priori SNR using a DNN for MMSE estimation of the speech and noise PSDs. The a priori and a posteriori SNRs are refined with the estimated PSDs, which in turn are used to compute spectral gain function. We also replaced the DNN architecture with the Conformer which efficiently captures the local and global sequential information. Experimental results on the Voice Bank-DEMAND dataset and Deep Xi dataset showed the proposed iDeepMMSE outperformed the DeepMMSE in terms of the perceptual evaluation of speech quality (PESQ) scores and composite objective measures.