Pinyin-Guided Chinese Speech Recognition with Large Language Model

#1 Pinyin-Guided Chinese Speech Recognition with Large Language Model [PDF³] [Copy] [Kimi²] [REL]

Authors: Jie Zhengjie, Gaofeng Cheng

While LLM-based automatic speech recognition (LLM-ASR) has demonstrated efficacy through direct acoustic-to-text mapping, its implicit alignment often fails to capture phonetic relationships in Chinese, leading to pronunciation confusion and homophone errors. This paper proposes Pinyin-Guided ASR (PYG-ASR), which innovatively modifies the LLM-ASR to simultaneously map acoustic features to both Pinyin and text tokens, enhancing linguistic representation. PYG-ASR leverages the generated Pinyin alongside text for error correction, prompting text LLM to refine transcriptions without finetuning. Furthermore, error correction phase inherently enables context biasing by filtering bias phrases through Pinyin matching and incorporating them into the prompt. Experiments show that PYG-ASR reduces CER by 25% on the AISHELL-1 test set. Additionally, our approach shows a 49.2% CER reduction relatively for bias phrases on the AISHELL-1 test set after contextual bias.

Subject: INTERSPEECH.2025 - Speech Recognition

zhengjie25@interspeech_2025@ISCA

#1 Pinyin-Guided Chinese Speech Recognition with Large Language Model [PDF3] [Copy] [Kimi2] [REL]

#1 Pinyin-Guided Chinese Speech Recognition with Large Language Model [PDF³] [Copy] [Kimi²] [REL]