ACD-CLIP: Decoupling Representation and Dynamic Fusion for Zero-Shot Anomaly Detection

#1 ACD-CLIP: Decoupling Representation and Dynamic Fusion for Zero-Shot Anomaly Detection [PDF²] [Copy] [Kimi] [REL]

Authors: Ke Ma, Jun Long, Hongxiao Fei, Liujie Hua, Zhen Dai, Yueyi Luo

Pre-trained Vision-Language Models (VLMs) struggle with Zero-Shot Anomaly Detection (ZSAD) due to a critical adaptation gap: they lack the local inductive biases required for dense prediction and employ inflexible feature fusion paradigms. We address these limitations through an Architectural Co-Design framework that jointly refines feature representation and cross-modal fusion. Our method proposes a parameter-efficient Convolutional Low-Rank Adaptation (Conv-LoRA) adapter to inject local inductive biases for fine-grained representation, and introduces a Dynamic Fusion Gateway (DFG) that leverages visual context to adaptively modulate text prompts, enabling a powerful bidirectional fusion. Extensive experiments on diverse industrial and medical benchmarks demonstrate superior accuracy and robustness, validating that this synergistic co-design is critical for robustly adapting foundation models to dense perception tasks. The source code is available at https://github.com/cockmake/ACD-CLIP.

Subjects: Computer Vision and Pattern Recognition , Artificial Intelligence , Machine Learning

Publish: 2025-08-11 10:03:45 UTC

2508.07819

#1 ACD-CLIP: Decoupling Representation and Dynamic Fusion for Zero-Shot Anomaly Detection [PDF2] [Copy] [Kimi] [REL]

#1 ACD-CLIP: Decoupling Representation and Dynamic Fusion for Zero-Shot Anomaly Detection [PDF²] [Copy] [Kimi] [REL]