Biophysically-inspired single-channel speech enhancement in the time domain

#1 Biophysically-inspired single-channel speech enhancement in the time domain [PDF³] [Copy] [Kimi⁴] [REL]

Most state-of-the-art speech enhancement (SE) methods utilize time-frequency (T-F) features or waveforms as input features and have poor generalizability at negative signal-to-noise ratios (SNR). To overcome these issues, we propose a novel network that integrates biophysical properties of the human auditory system known to perform even at negative SNRs. We generated biophysical features using CoNNear, a neural network auditory model, which were fed into a SOTA speech enhancement model AECNN. The model was trained on the INTERSPEECH 2021 DNS Challenge dataset and evaluated on mismatched noise conditions at various SNRs. The experimental results revealed that the bio-inspired approaches outperformed T-F and waveform features under positive SNRs and demonstrated stronger robustness to unseen noise at negative SNRs. We conclude that incorporating human-like features can extend the operating range of SE systems to more negative SNRs.

Subject: INTERSPEECH.2023 - Speech Processing

wen23b@interspeech_2023@ISCA

#1 Biophysically-inspired single-channel speech enhancement in the time domain [PDF3] [Copy] [Kimi4] [REL]

#1 Biophysically-inspired single-channel speech enhancement in the time domain [PDF³] [Copy] [Kimi⁴] [REL]