manhtienanh24@interspeech_2024@ISCA

Total: 1

#1 Improving Speech Recognition with Prompt-based Contextualized ASR and LLM-based Re-predictor [PDF1] [Copy] [Kimi] [REL]

Authors: Nguyen Manh Tien Anh ; Thach Ho Sy

In recent years, advancements in automatic speech recognition (ASR) systems have led to their widespread use in applications such as call center bots and virtual assistants. However, these systems encounter challenges in adverse speech conditions, lack of contextual information, and recognizing rare words. In this paper, we propose a novel architecture to tackle these limitations by integrating Large Language Models (LLMs) and prompt mechanisms, aiming to enhance ASR accuracy. By using a pre-trained text encoder with a text adapter for task-specific adaptation and an efficient LLM-based re-prediction mechanism, our method has shown remarkable results in various real-world scenarios. Our proposed system achieves an average relative word error rate improvement of 27% for conventional tasks, 30% for utterance-level contextual tasks, and 33% for word-level biasing tasks compared to a baseline ASR system on multiple public datasets.