BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models

#1 BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models [PDF³] [Copy] [Kimi³] [REL]

Authors: Zenghui Yuan, Jiawen Shi, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun

Multi-modal large language models (MLLMs) extend large language models (LLMs) to process multi-modal information, enabling them to generate responses to image-text inputs. MLLMs have been incorporated into diverse multi-modal applications, such as autonomous driving and medical diagnosis, via plug-and-play without fine-tuning. This deployment paradigm increases the vulnerability of MLLMs to backdoor attacks. However, existing backdoor attacks against MLLMs achieve limited effectiveness and stealthiness. In this work, we propose $\textit{BadToken}$, the first token-level backdoor attack to MLLMs. BadToken introduces two novel backdoor behaviors: $\textit{Token-substitution}$ and $\textit{Token-addition}$, which enable flexible and stealthy attacks by making token-level modifications to the original output for backdoored inputs. We formulate a general optimization problem that considers the two backdoor behaviors to maximize the attack effectiveness. We evaluate BadToken on two open-source MLLMs and various tasks. Our results show that our attack maintains the model's utility while achieving high attack success rates and stealthiness. We also show the real-world threats of BadToken in two scenarios, i.e., autonomous driving and medical diagnosis. Furthermore, we consider defenses including fine-tuning and input purification. Our results highlight the threat of our attack.

Subject: CVPR.2025 - Poster

Yuan_BadToken_Token-level_Backdoor_Attacks_to_Multi-modal_Large_Language_Models@CVPR2025@CVF

#1 BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models [PDF3] [Copy] [Kimi3] [REL]

#1 BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models [PDF³] [Copy] [Kimi³] [REL]