Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations

#1 Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations [PDF¹] [Copy] [Kimi] [REL]

Authors: Xin Guo, Chunrui Zhao, Hong Jia, Ting Dang, Gongping Huang, Xianrui Zheng, Yan Gao

Integrating Federated Learning (FL) with self-supervised learning (SSL) enables privacy-preserving fine-tuning for speech tasks. However, federated environments exhibit significant heterogeneity: clients differ in computational capacity, causing straggler effects under unified fine-tuning, while diverse downstream tasks require different representation depths, making full-model updates inefficient. To address these challenges, we propose an adaptive federated fine-tuning framework with early exits. Lightweight prediction heads are inserted at intermediate layers of the SSL backbone, allowing clients to terminate computation based on local constraints and task requirements. We further introduce a layer-wise, depth-aware partial aggregation strategy to better utilize representations from different network depths. Experiments show that the framework reduces edge overhead, supports heterogeneous hardware, and maintains competitive performance in resource-constrained federated environments.

Subject: Audio and Speech Processing

Publish: 2026-03-23 12:14:32 UTC

2603.21888

#1 Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations [PDF1] [Copy] [Kimi] [REL]

#1 Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations [PDF¹] [Copy] [Kimi] [REL]