XAI-Grounded Explanation Generation for Speech Deepfake Detection with Training-Free Multimodal Large Language Models

#1 XAI-Grounded Explanation Generation for Speech Deepfake Detection with Training-Free Multimodal Large Language Models [PDF] [Copy] [Kimi] [REL]

Authors: Yupei Li, Qiyang Sun, Xiaoliang Wu, Chenxi Wang, Berrak Sisman, Björn W. Schuller

Speech deepfake detection (SDD) systems require trustworthy explanations for reliable decision-making. Existing explanation ways mainly fall into two categories. Traditional explainable AI (XAI), such as gradient-based attribution, produces low-level attribution signals tightly coupled with model decisions, and harder to be understood by human than natural language explanations. Meanwhile, large language model (LLM)-based explanation generation often produces generic and ungrounded descriptions due to the lack of heuristic evidence and task-specific supervision, stemming from limited grounded explanation datasets for SDD. We therefore propose a training-free explanation framework that integrates XAI evidence with multimodal LLMs to generate grounded and specific explanations. Using the PartialSpoof dataset, we construct a grounded explanation dataset and show that methods with XAI increase inside accuracy by over 45\%, verified through human evaluation and faithfulness checks.

Subjects: Computation and Language , Artificial Intelligence

Publish: 2026-06-15 02:55:21 UTC

2606.16137

#1 XAI-Grounded Explanation Generation for Speech Deepfake Detection with Training-Free Multimodal Large Language Models [PDF] [Copy] [Kimi] [REL]