Textual-Driven Adversarial Purification for Speaker Verification

#1 Textual-Driven Adversarial Purification for Speaker Verification [PDF] [Copy] [Kimi] [REL]

Authors: Sizhou Chen, Yibo Bai, Jiadi Yao, Xiao-Lei Zhang, Xuelong Li

Adversarial attacks introduce subtle perturbations to audio signals for causing automatic speaker verification (ASV) systems to make mistakes. To address this challenge, adversarial purification techniques have emerged, where diffusion models have been proven effective. However, the latest development with the diffusion models caused a negative effect that the audio generation quality is not high enough. Moreover, these approaches tend to focus solely on audio features, while often neglecting textual information. To overcome these limitations, we propose a textual-driven adversarial purification (TDAP) framework, which integrates diffusion models with pretrained large audio language models for comprehensive defense. TDAP employs textual data extracted from audio to guide the diffusion-based purification process. Extensive experimental results show that TDAP significantly enhances the defense robustness against adversarial attacks.

Subject: INTERSPEECH.2024 - Speech Detection

chen24@interspeech_2024@ISCA

#1 Textual-Driven Adversarial Purification for Speaker Verification [PDF] [Copy] [Kimi] [REL]