wang24q@interspeech_2024@ISCA

Total: 1

#1 Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer [PDF] [Copy] [Kimi] [REL]

Authors: Peng Wang ; Yifan Yang ; Zheng Liang ; Tian Tan ; Shiliang Zhang ; Xie Chen

Despite advancements of end-to-end (E2E) models in speech recognition, named entity recognition (NER) is still challenging but critical for semantic understanding. Previous studies mainly focus on various rule-based or attention-based contextual biasing algorithms. However, their performance might be sensitive to the biasing weight or degraded by excessive attention to the named entity list, along with a risk of false triggering. Inspired by the success of the class-based language model (LM) in NER in conventional hybrid systems and the effective decoupling of acoustic and linguistic information in the factorized neural Transducer (FNT), we propose C-FNT, a novel E2E model that incorporates class-based LMs into FNT. In C-FNT, the LM score of named entities can be associated with the name class instead of its surface form. The experimental results show that our proposed C-FNT significantly reduces error in named entities without hurting performance in general word recognition.