Towards Semantics- and Domain-Aware Adversarial Attacks

#1 Towards Semantics- and Domain-Aware Adversarial Attacks [PDF²] [Copy] [Kimi] [REL]

Authors: Jianping Zhang, Yung-Chieh Huang, Weibin Wu, Michael R. Lyu

Language models are known to be vulnerable to textual adversarial attacks, which add human-imperceptible perturbations to the input to mislead DNNs. It is thus imperative to devise effective attack algorithms to identify the deficiencies of DNNs before real-world deployment. However, existing word-level attacks have two major deficiencies: (1) They may change the semantics of the original sentence. (2) The generated adversarial sample can appear unnatural to humans due to the introduction of out-of-domain substitute words. In this paper, to address such drawbacks, we propose a semantics- and domain-aware word-level attack method. Specifically, we greedily replace the important words in a sentence with the ones suggested by a language model. The language model is trained to be semantics- and domain-aware via contrastive learning and in-domain pre-training. Furthermore, to balance the quality of adversarial examples and the attack success rate, we propose an iterative updating framework to optimize the contrastive learning loss and the in-domain pre-training loss in circular order. Comprehensive experimental comparisons confirm the superiority of our approach. Notably, compared with state-of-the-art benchmarks, our strategy can achieve over 3\% improvement in attack success rates and 9.8\% improvement in the quality of adversarial examples.

Subject: IJCAI.2023 - AI Ethics, Trust, Fairness

60@2023@IJCAI

#1 Towards Semantics- and Domain-Aware Adversarial Attacks [PDF2] [Copy] [Kimi] [REL]

#1 Towards Semantics- and Domain-Aware Adversarial Attacks [PDF²] [Copy] [Kimi] [REL]