2508.03937

Total: 1

#1 LCS-CTC: Leveraging Soft Alignments to Enhance Phonetic Transcription Robustness [PDF1] [Copy] [Kimi] [REL]

Authors: Zongli Ye, Jiachen Lian, Akshaj Gupta, Xuanru Zhou, Haodong Li, Krish Patel, Hwi Joo Park, Dingkun Zhou, Chenxu Guo, Shuhe Li, Sam Wang, Iris Zhou, Cheol Jun Cho, Zoe Ezzes, Jet M. J. Vonk, Brittany T. Morin, Rian Bogley, Lisa Wauters, Zachary A. Miller, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli

Phonetic speech transcription is crucial for fine-grained linguistic analysis and downstream speech applications. While Connectionist Temporal Classification (CTC) is a widely used approach for such tasks due to its efficiency, it often falls short in recognition performance, especially under unclear and nonfluent speech. In this work, we propose LCS-CTC, a two-stage framework for phoneme-level speech recognition that combines a similarity-aware local alignment algorithm with a constrained CTC training objective. By predicting fine-grained frame-phoneme cost matrices and applying a modified Longest Common Subsequence (LCS) algorithm, our method identifies high-confidence alignment zones which are used to constrain the CTC decoding path space, thereby reducing overfitting and improving generalization ability, which enables both robust recognition and text-free forced alignment. Experiments on both LibriSpeech and PPA demonstrate that LCS-CTC consistently outperforms vanilla CTC baselines, suggesting its potential to unify phoneme modeling across fluent and non-fluent speech.

Subject: Audio and Speech Processing

Publish: 2025-08-05 21:59:35 UTC