Spoofing Speech Detection by Modeling Local Spectro-Temporal and Long-term Dependency

#1 Spoofing Speech Detection by Modeling Local Spectro-Temporal and Long-term Dependency [PDF¹] [Copy] [Kimi] [REL]

Authors: Haochen Wu, Wu Guo, Zhentao Zhang, Wenting Zhao, Shengyu Peng, Jie Zhang

In this work, a dual-branch network is proposed to exploit both local and global information of utterances for spoofing speech detection (SSD). The local artifacts of spoofing speech can reside in specific temporal or spectral regions, which are the primary objectives for SSD systems. We propose a spectro-temporal graph attention network to jointly capture the temporal and spectral differences of the spoofing speech. It is different from existing methods that the proposed method exploits the cross attention mechanism to bridge the spectro-temporal dependency. As the global artifacts can also provide complimentary information for SSD, we use a BiLSTM-based branch to modeling temporal long-term discriminative clues. These two branches are then separately optimized with the weighted cross-entropy loss, and the scores are fused at equal weights. Results on three benchmark datasets (i.e., ASVspoof 2019, 2021 LA and 2021 DF) reveal the superiority of the proposed method over advanced systems.

Subject: INTERSPEECH.2024 - Speech Detection

wu24b@interspeech_2024@ISCA

#1 Spoofing Speech Detection by Modeling Local Spectro-Temporal and Long-term Dependency [PDF1] [Copy] [Kimi] [REL]

#1 Spoofing Speech Detection by Modeling Local Spectro-Temporal and Long-term Dependency [PDF¹] [Copy] [Kimi] [REL]