le25b@interspeech_2025@ISCA

Total: 1

#1 Multistage Universal Speech Enhancement System for URGENT Challenge [PDF] [Copy] [Kimi] [REL]

Authors: Xiaohuai Le, Zhuangqi Chen, Siyu Sun, Xianjun Xia, Chuanzeng Huang

During audio transmission and processing, various distortions may occur. To effectively address this challenge, we developed a multistage universal speech enhancement system, consisting of four submodules, namely audio declipping, packet loss compensation, audio separation, and spectral inpainting. These modules operate across the time, sub-band, and time-frequency domains. We employed a pretrain-finetune training paradigm and introduce a self-distillation method to further improve performance. Experiments on large-scale datasets demonstrate that our system outperforms in multiple evaluation metrics, particularly in improving subjective speech quality. The proposed system ranked 1st in the URGENT 2024 challenge with a MOS of 3.52 and placed 4th in the second track of the URGENT 2025 challenge.

Subject: INTERSPEECH.2025 - Modelling and Learning