takagi24@interspeech_2024@ISCA

Total: 1

#1 Text-only Domain Adaptation for CTC-based Speech Recognition through Substitution of Implicit Linguistic Information in the Search Space [PDF1] [Copy] [Kimi] [REL]

Authors: Tatsunari Takagi ; Yukoh Wakabayashi ; Atsunori Ogawa ; Norihide Kitaoka

Domain adaptation using only language models in Automatic Speech Recognition (ASR) has been widely studied because of its practicality. Still, it remains challenging for non-autoregressive ASR models such as Connectionist Temporal Classification (CTC)-based ones. Against this background, this study addresses a text-only domain adaptation method for CTC-based ASR models by leveraging the Density Ratio Approach (DRA). Our method combines a beam search algorithm for substituting linguistic information in DRA, accommodated to the CTC decoding procedure, and a language model adaptation method considered the conditional independence assumption of CTC. We conducted domain adaptation experiments for character-level ASR with the Corpus of Spontaneous Japanese (CSJ) and sub-word ASR with the English-language LibriSpeech and GigaSpeech corpora. The experimental results confirmed that our proposed method achieved improved accuracy in Japanese and English compared to the Shallow Fusion method.