NCzNOKjKnD@OpenReview

Total: 1

#1 Consensus-Robust Transfer Attacks via Parameter and Representation Perturbations [PDF] [Copy] [Kimi1] [REL]

Authors: Shixin Li, Zewei Li, Xiaojing Ma, Xiaofan Bai, Pingyi Hu, Dongmei Zhang, Bin Benjamin Zhu

Adversarial examples crafted on one model often exhibit poor transferability to others, hindering their effectiveness in black-box settings. This limitation arises from two key factors: (i) \emph{decision-boundary variation} across models and (ii) \emph{representation drift} in feature space. We address these challenges through a new perspective that frames transferability for \emph{untargeted attacks} as a \emph{consensus-robust optimization} problem: adversarial perturbations should remain effective across a neighborhood of plausible target models. To model this uncertainty, we introduce two complementary perturbation channels: a \emph{parameter channel}, capturing boundary shifts via weight perturbations, and a \emph{representation channel}, addressing feature drift via stochastic blending of clean and adversarial activations. We then propose \emph{CORTA} (COnsensus--Robust Transfer Attack), a lightweight attack instantiated from this robust formulation using two first-order strategies: (i) sensitivity regularization based on the squared Frobenius norm of logits’ Jacobian with respect to weights, and (ii) Monte Carlo sampling for blended feature representations. Our theoretical analysis provides a certified lower bound linking these approximations to the robust objective. Extensive experiments on CIFAR-100 and ImageNet show that CORTA significantly outperforms state-of-the-art transfer-based methods---including ensemble approaches---across CNN and Vision Transformer targets. Notably, CORTA achieves a \emph{19.1 percentage-point gain in transfer success rate over the best prior method} while using only a single surrogate model.

Subject: NeurIPS.2025 - Poster