PdBEggnDIl@OpenReview

Total: 1

#1 Adaptive Median Smoothing: Adversarial Defense for Unlearned Text-to-Image Diffusion Models at Inference Time [PDF2] [Copy] [Kimi] [REL]

Authors: XIAOXUAN HAN, Songlin Yang, Wei Wang, Yang Li, JING DONG

Text-to-image (T2I) diffusion models have raised concerns about generating inappropriate content, such as "*nudity*". Despite efforts to erase undesirable concepts through unlearning techniques, these unlearned models remain vulnerable to adversarial inputs that can potentially regenerate such content. To safeguard unlearned models, we propose a novel inference-time defense strategy that mitigates the impact of adversarial inputs. Specifically, we first reformulate the challenge of ensuring robustness in unlearned diffusion models as a robust regression problem. Building upon the naive median smoothing for regression robustness, which employs isotropic Gaussian noise, we develop a generalized median smoothing framework that incorporates anisotropic noise. Based on this framework, we introduce a token-wise ***Adaptive Median Smoothing*** method that dynamically adjusts noise intensity according to each token's relevance to target concepts. Furthermore, to improve inference efficiency, we explore implementations of this adaptive method at the text-encoding stage. Extensive experiments demonstrate that our approach enhances adversarial robustness while preserving model utility and inference efficiency, outperforming baseline defense techniques.

Subject: ICML.2025 - Poster