CacheFL: Efficient Federated Cache Model Fine-Tuning for Vision-Language Models

#1 CacheFL: Efficient Federated Cache Model Fine-Tuning for Vision-Language Models [PDF] [Copy] [Kimi] [REL]

Authors: Mengjun Yi, Hanwen Zhang, Hui Dou, Jian Zhao, Furao Shen

Large pre-trained Vision-Language Models (VLMs), such as Contrastive Language-Image Pre-training (CLIP), have exhibited remarkable zero-shot performance across various image classification tasks. Fine-tuning these models on domain-specific datasets further enhances their effectiveness for downstream applications. However, fine-tuning in cloud environments raises significant concerns regarding data security and privacy. Federated Learning (FL) offers a decentralized solution by enabling model training across local clients without centralizing sensitive data, but the high communication and computation costs of transmitting full pre-trained models during training limit its scalability. Additionally, non-Independent and Identically Distributed (non-IID) data across local clients can negatively impact model convergence and performance. To address these challenges, we propose CacheFL, a novel federated learning method that replaces traditional full model fine-tuning with lightweight cache model fine-tuning. The cache model is initialized using a class-balanced dataset generated by a generative pre-trained model, effectively mitigating the impact of non-IID data. This cache model is then distributed to local clients for fine-tuning, and the updated parameters from each client are aggregated on the server and redistributed. With the updated cache model, the classification performance of CLIP is improved after just a few epochs. By limiting the training and communication to the cache model, CacheFL significantly reduces resource demands while ensuring data privacy and security. Extensive experiments conducted on ImageNet and 10 additional datasets demonstrate that CacheFL outperforms traditional approaches in terms of classification accuracy, resource efficiency, and privacy preservation.

Subject: Distributed, Parallel, and Cluster Computing

Publish: 2025-05-08 11:07:35 UTC

2505.05130

#1 CacheFL: Efficient Federated Cache Model Fine-Tuning for Vision-Language Models [PDF] [Copy] [Kimi] [REL]