ni24@interspeech_2024@ISCA

Total: 1

#1 MSA-DPCRN: A Multi-Scale Asymmetric Dual-Path Convolution Recurrent Network with Attentional Feature Fusion for Acoustic Echo Cancellation [PDF4] [Copy] [Kimi4] [REL]

Authors: Ye Ni, Cong Pang, Chengwei Huang, Cairong Zou

Echo cancellation plays a crucial role in modern speech applications. Numerous deep-learning models have been developed for the echo cancellation task and achieved great progress by incorporating additional features; however, the majority of these models overlook the characteristics of different features and simply merge them along the channel dimension. In this paper, we proposed a multi-scale asymmetric dual-path convolution recurrent network (MSA-DPCRN) consisting of two asymmetric encoding paths to extract spectrum and relevant features from the input reference and microphone signals. Moreover, we propose a frequency-wise attentional feature fusion (AFF) method to fuse the two features while maintaining the original dynamic range. The experiments validate the effectiveness of each component in MSA-DPCRN and indicate that our model outperforms the AEC challenge baseline in terms of the Echo-MOS metrics.

Subject: INTERSPEECH.2024 - Speech Processing