Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition

#1 Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition [PDF] [Copy] [Kimi¹] [REL]

Authors: Zhihao Du, Jiqing Han, Xueliang Zhang

To improve the noise robustness of automatic speech recognition (ASR), the generative adversarial network (GAN) based enhancement methods are employed as the front-end processing, which comprise a single adversarial process of an enhancement model and a discriminator. In this single adversarial process, the discriminator is encouraged to find differences between the enhanced and clean speeches, but the distribution of clean speeches is ignored. In this paper, we propose a double adversarial network (DAN) by adding another adversarial generation process (AGP), which forces the discriminator not only to find the differences but also to model the distribution. Furthermore, a functional mean square error (f-MSE) is proposed to utilize the representations learned by the discriminator. Experimental results reveal that AGP and f-MSE are crucial for the enhancement performance on ASR task, which are missed in previous GAN-based methods. Specifically, our DAN achieves 13.00% relative word error rate improvements over the noisy speeches on the test set of CHiME-2, which outperforms several recent GAN-based enhancement methods significantly.

Subject: INTERSPEECH.2020 - Speech Recognition

du20@interspeech_2020@ISCA

#1 Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition [PDF] [Copy] [Kimi1] [REL]

#1 Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition [PDF] [Copy] [Kimi¹] [REL]