wei24@interspeech_2024@ISCA

Total: 1

#1 Prompt Tuning for Speech Recognition on Unknown Spoken Name Entities [PDF] [Copy] [Kimi] [REL]

Authors: Xizi Wei ; Stephen McGregor

This paper explores the challenge of recognising relevant but previously unheard named entities in spoken input. This scenario pertains to real-world applications where establishing an automatic speech recognition (ASR) model trained on new entity phrases may not be efficient. We propose a technique that involves fine-tuning a Whisper model with a list of entity phrases as prompts. We establish a task-specific dataset where stratification of different entity phrases supports evaluation of three different scenarios in which entities might be encountered. We focus our analysis on a seen-but-unheard scenario, reflecting a situation where only textual representations of novel entity phrases are available for a commercial banking assistant bot. We show that a model tuned to anticipate prompts reflecting novel named entities makes substantial improvements in entity recall over non-tuned baseline models, and meaningful improvements in performance over models fine-tuned without a prompt.