gao25@interspeech_2025@ISCA

Total: 1

#1 TSDT-Net: Ultra-Low-Complexity Two-Stage Model Combining Dual-Path-Transformer and Transform-Average-Concatenate Network for Speech Enhancement [PDF3] [Copy] [Kimi] [REL]

Authors: Yi Gao, Hangting Chen, Siyu Zhang, Qingshan Yang, Jingcong Chen

This paper introduces TSDT-Net, a dual-stage ultra-low-complexity architecture for speech enhancement which achieves higher denoising performance with limited parameter number and computational cost. Its first stage utilizes a simplified Dual-Path-Transformer(DPT) structure. In the second stage, the first-stage output and original noisy signal are treated as dual-channel inputs, modeled as a beamforming optimization problem. An enhanced Transform-Average-Concatenate (TAC) network processes these channels through spectral filtering and enhancement. Fast linear transformers ensure ultra-low computational overhead, while gated networks in both stages facilitate complex Ideal Ratio Mask (cIRM) construction. Residual connections between stages enable performance synergies. Evaluations on the INTERSPEECH 2020 DNS Challenge demonstrate TSDT-Net's superiority, achieving staet-of-the-art DNSMOS and PESQ scores with significant margins over single-stage models under stringent computational constraints (<700K parameters and <500M/ 200M MACs). This efficiency enables deployment across diverse embedded devices.

Subject: INTERSPEECH.2025 - Speech Processing