zhang21b@interspeech_2021@ISCA

Total: 1

#1 Temporal Convolutional Network with Frequency Dimension Adaptive Attention for Speech Enhancement [PDF] [Copy] [Kimi1]

Authors: Qiquan Zhang ; Qi Song ; Aaron Nicolson ; Tian Lan ; Haizhou Li

Despite much progress, most temporal convolutional networks (TCN) based speech enhancement models are mainly focused on modeling the long-term temporal contextual dependencies of speech frames, without taking into account the distribution information of speech signal in frequency dimension. In this study, we propose a frequency dimension adaptive attention (FAA) mechanism to improve TCNs, which guides the model selectively emphasize the frequency-wise features with important speech information and also improves the representation capability of network. Our extensive experimental investigation demonstrates that the proposed FAA mechanism is able to consistently provide significant improvements in terms of speech quality (PESQ), intelligibility (STOI) and three other composite metrics. More promisingly, it has better generalization ability to real-world noisy environment.