okabe25@interspeech_2025@ISCA

Total: 1

#1 Simultaneous Masked and Unmasked Decoding with Speculative Decoding Masking for Fast ASR without Accuracy Loss [PDF] [Copy] [Kimi] [REL]

Authors: Koji Okabe, Hitoshi Yamamoto

In this paper, we introduce two methods, Simultaneous Masked and Unmasked Decoding (SMUD) and speculative decoding masking, into Partially autoregressive (PAR) decoding. These methods achieve the same recognition accuracy as Autoregressive (AR) decoding while maintaining higher computational efficiency than AR in Automatic Speech Recognition (ASR). SMUD and speculative decoding masking can accurately identify hypotheses where decoder score computation can be omitted. By omitting these computations, they achieve faster processing while obtaining the same search results as AR decoding. In TED-LIUM2 evaluations, SMUD with speculative decoding masking achieved a WER of 7.3% and an RTF of 0.41, as compared to AR's WER of 7.3% and RTF of 0.59, showcasing the method’s ability to maintain the same high accuracy as AR while enhancing computational efficiency.

Subject: INTERSPEECH.2025 - Speech Recognition