Gradient Inversion of Multimodal Models

#1 Gradient Inversion of Multimodal Models [PDF⁵] [Copy] [Kimi] [REL]

Authors: Omri Ben Hemo, Alon Zolfi, Oryan Yehezkel, Omer Hofman, Roman Vainshtein, Hisashi Kojima, Yuval Elovici, Asaf Shabtai

Federated learning (FL) enables privacy-preserving distributed machine learning by sharing gradients instead of raw data. However, FL remains vulnerable to gradient inversion attacks, in which shared gradients can reveal sensitive training data. Prior research has mainly concentrated on unimodal tasks, particularly image classification, examining the reconstruction of single-modality data, and analyzing privacy vulnerabilities in these relatively simple scenarios. As multimodal models are increasingly used to address complex vision-language tasks, it becomes essential to assess the privacy risks inherent in these architectures. In this paper, we explore gradient inversion attacks targeting multimodal vision-language Document Visual Question Answering (DQA) models and propose GI-DQA, a novel method that reconstructs private document content from gradients. Through extensive evaluation on state-of-the-art DQA models, our approach exposes critical privacy vulnerabilities and highlights the urgent need for robust defenses to secure multimodal FL systems.

Subject: ICML.2025 - Poster

j4IELrBhoG@OpenReview

#1 Gradient Inversion of Multimodal Models [PDF5] [Copy] [Kimi] [REL]

#1 Gradient Inversion of Multimodal Models [PDF⁵] [Copy] [Kimi] [REL]