SparQLe: Speech Queries to Text Translation Through LLMs

2025.iwslt-1.6@ACL

Total: 1

#1 SparQLe: Speech Queries to Text Translation Through LLMs [PDF] [Copy] [Kimi] [REL]

Authors: Amirbek Djanibekov, Hanan Aldarmaki

With the growing influence of Large Language Models (LLMs), there is increasing interest in integrating speech representations with them to enable more seamless multi-modal processing and speech understanding. This study introduces a novel approach that combines self-supervised speech representations with instruction-tuned LLMs for speech-to-text translation. The proposed approach leverages a modality adapter to align extracted speech features with instruction-tuned LLMs using English speech data. Our experiments demonstrate that this method effectively preserves the semantic content of the input speech and serves as an effective bridge between self-supervised speech models and instruction-tuned LLMs, offering a promising approach for various speech understanding applications.

Subject: IWSLT.2025