andrusenko24@interspeech_2024@ISCA

Total: 1

#1 Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [PDF] [Copy] [Kimi] [REL]

Authors: Andrei Andrusenko ; Aleksandr Laptev ; Vladimir Bataev ; Vitaly Lavrukhin ; Boris Ginsburg

Accurate recognition of rare and new words remains a pressing problem for contextualized Automatic Speech Recognition (ASR) systems. Most context-biasing methods involve modification of the ASR model or the beam-search decoding algorithm, complicating model reuse and slowing down inference. This work presents a new approach to fast context-biasing with CTC-based Word Spotter (CTC-WS) for CTC and Transducer (RNN-T) ASR models. The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates. The valid candidates then replace their greedy recognition counterparts in corresponding frame intervals. A Hybrid Transducer-CTC model enables the CTC-WS application for the Transducer model. The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER compared to baseline methods. The proposed method is publicly available in the NVIDIA NeMo toolkit.