Li_Omni-Fake_Benchmarking_Unified_Multimodal_Social_Media_Deepfake_Detection@CVPR2026@CVF

Total: 1

#1 Omni-Fake: Benchmarking Unified Multimodal Social Media Deepfake Detection [PDF] [Copy] [Kimi] [REL]

Authors: Tianxiao Li, Zhenglin Huang, Haiquan Wen, Yiwei He, Xinze Li, Bingyu Zhu, Wuhui Duan, Congang Chen, Zeyu Fu, Yi Dong, Baoyuan Wu, Xiangtai Li, Guangliang Cheng

Multimodal Deepfakes proliferating on social media threaten authenticity, information integrity, and digital forensics. Existing benchmarks are constrained by their single-modality scope, simplified manipulations, or unrealistic distributions, which limit their ability to assess real-world robustness. We present Omni-Fake, a unified omni-dataset for comprehensive multimodal deepfake detection in social-media settings. It comprises Omni-Fake-Set, a large-scale, high-quality dataset with 1M+ samples, and Omni-Fake-OOD, an out-of-distribution benchmark with 100k+ samples intentionally excluded from training to evaluate generalization. Omni-Fake spans four modalities--image, audio, video, and audio-video talking head and supports a joint detection-localization-explanation protocol. For images, audio, and videos, we define a ternary task (real / partially manipulated / fully synthetic) with spatial or temporal localization masks for fine-grained reasoning. Talking heads are formulated as an audio-video fusion binary task targeting speaking digital humans and lip-synced avatar forgeries. On top of Omni-Fake, we further propose Omni-Fake-R1, a reinforcement-learning-driven multimodal detector that adaptively integrates visual and auditory cues and outputs structured decisions, localization, and natural-language explanations. Extensive experiments show significant gains in detection accuracy, cross-modal generalization, and explainability over state-of-the-art baselines. Code will be released.

Subject: CVPR.2026 - Poster