Total: 1
In the black-box attack for speaker recognition systems, the adversarial examples can exhibit better transferability for unseen victim system if they can consistently spoof an ensemble of substitute models. In this work, we propose a gradient-aligned ensemble attack method to find the optimal gradient direction to update the adversarial example using a set of substitute models. Specifically, we first calculate the overfitting-reduced gradient for each substitute model by randomly masking some regions of the input acoustic features. Then we obtain the weight of the gradient for each substitute model based on the consistency of its gradient with respect to others. The final update gradient is calculated by the weighted sum of the gradients over all substitute models. Experimental results on the VoxCeleb dataset verify the effectiveness of the proposed approach for the speaker identification and speaker verification tasks.