Ex-VAD: Explainable Fine-grained Video Anomaly Detection Based on Visual-Language Models

#1 Ex-VAD: Explainable Fine-grained Video Anomaly Detection Based on Visual-Language Models [PDF¹] [Copy] [Kimi] [REL]

Authors: Chao Huang, Yushu Shi, Jie Wen, Wei Wang, Yong Xu, Xiaochun Cao

With advancements in visual language models (VLMs) and large language models (LLMs), video anomaly detection (VAD) has progressed beyond binary classification to fine-grained categorization and multidimensional analysis. However, existing methods focus mainly on coarse-grained detection, lacking anomaly explanations. To address these challenges, we propose Ex-VAD, an Explainable Fine-grained Video Anomaly Detection approach that combines fine-grained classification with detailed explanations of anomalies. First, we use a VLM to extract frame-level captions, and an LLM converts them to video-level explanations, enhancing the model's explainability. Second, integrating textual explanations of anomalies with visual information greatly enhances the model's anomaly detection capability. Finally, we apply label-enhanced alignment to optimize feature fusion, enabling precise fine-grained detection. Extensive experimental results on the UCF-Crime and XD-Violence datasets demonstrate that Ex-VAD significantly outperforms existing State-of-The-Art methods.

Subject: ICML.2025 - Poster

xAhUoyb5eU@OpenReview

#1 Ex-VAD: Explainable Fine-grained Video Anomaly Detection Based on Visual-Language Models [PDF1] [Copy] [Kimi] [REL]

#1 Ex-VAD: Explainable Fine-grained Video Anomaly Detection Based on Visual-Language Models [PDF¹] [Copy] [Kimi] [REL]