X-FLoRA: Cross-modal Federated Learning with Modality-expert LoRA for Medical VQA

#1 X-FLoRA: Cross-modal Federated Learning with Modality-expert LoRA for Medical VQA [PDF] [Copy] [Kimi] [REL]

Authors: Min Hyuk Kim, Changheon Kim, Seok Bong Yoo

Medical visual question answering (VQA) and federated learning (FL) have emerged as vital approaches for enabling privacy-preserving, collaborative learning across clinical institutions. However, both these approaches face significant challenges in cross-modal FL scenarios, where each client possesses unpaired images from only one modality. To address this limitation, we propose X-FLoRA, a cross-modal FL framework that uses modality-expert low-rank adaptation (LoRA) for medical VQA. Specifically, X-FLoRA enables the synthesis of images from one modality to another without requiring data sharing between clients. This is achieved by training a backward translation model within a federated asymmetric translation scheme that integrates clinical semantics from textual data. Additionally, X-FLoRA introduces modality-expert LoRA, which fine-tunes separate LoRA modules to strengthen modality-specific representations in the VQA task. The server aggregates the trained backward translation models and fine-tuned LoRA modules using discriminator quality scores and expert-aware weighting, which regulate the relative contributions from different clients. Experiments were conducted on VQA datasets encompassing different medical modalities, and the results demonstrate that X-FLoRA outperforms existing FL methods in terms of VQA performance.

Subject: EMNLP.2025 - Main

2025.emnlp-main.422@ACL

#1 X-FLoRA: Cross-modal Federated Learning with Modality-expert LoRA for Medical VQA [PDF] [Copy] [Kimi] [REL]