ge-attacks@usenixsecurity24@USENIX

Total: 1

#1 More Simplicity for Trainers, More Opportunity for Attackers: Black-Box Attacks on Speaker Recognition Systems by Inferring Feature Extractor [PDF] [Copy] [Kimi1] [REL]

Authors: Yunjie Ge ; Pinji Chen ; Qian Wang ; Lingchen Zhao ; Ningping Mou ; Peipei Jiang ; Cong Wang ; Qi Li ; Chao Shen

Recent studies have revealed that deep learning-based speaker recognition systems (SRSs) are vulnerable to adversarial examples (AEs). However, the practicality of existing black-box AE attacks is restricted by the requirement for extensive querying of the target system or the limited attack success rates (ASR). In this paper, we introduce VoxCloak, a new targeted AE attack with superior performance in both these aspects. Distinct from existing methods that optimize AEs by querying the target model, VoxCloak initially employs a small number of queries (e.g., a few hundred) to infer the feature extractor used by the target system. It then utilizes this feature extractor to generate any number of AEs locally without the need for further queries. We evaluate VoxCloak on four commercial speaker recognition (SR) APIs and seven voice assistants. On the SR APIs, VoxCloak surpasses the existing transfer-based attacks, improving ASR by 76.25% and signal-to-noise ratio (SNR) by 13.46 dB, as well as the decision-based attacks, requiring 33 times fewer queries and improving SNR by 7.87 dB while achieving comparable ASRs. On the voice assistants, VoxCloak outperforms the existing methods with a 49.40% improvement in ASR and a 15.79 dB improvement in SNR.