ni24@interspeech_2024@ISCA

Total: 1

#1 MSA-DPCRN: A Multi-Scale Asymmetric Dual-Path Convolution Recurrent Network with Attentional Feature Fusion for Acoustic Echo Cancellation [PDF] [Copy] [Kimi2] [REL]

Authors: Ye Ni ; Cong Pang ; Chengwei Huang ; Cairong Zou

Echo cancellation plays a crucial role in modern speech applications. Numerous deep-learning models have been developed for the echo cancellation task and achieved great progress by incorporating additional features; however, the majority of these models overlook the characteristics of different features and simply merge them along the channel dimension. In this paper, we proposed a multi-scale asymmetric dual-path convolution recurrent network (MSA-DPCRN) consisting of two asymmetric encoding paths to extract spectrum and relevant features from the input reference and microphone signals. Moreover, we propose a frequency-wise attentional feature fusion (AFF) method to fuse the two features while maintaining the original dynamic range. The experiments validate the effectiveness of each component in MSA-DPCRN and indicate that our model outperforms the AEC challenge baseline in terms of the Echo-MOS metrics.