#1 Adding User Feedback To Enhance CB-Whisper

Author: Raul Monteiro

Contextual biasing has been demonstrated to be effective in improving Whisper recall for named entities or domain-specific words. In a recent work, CB-Whisper takes an additional step and integrates a classifier for open-vocabulary keyword-spotting (OV-KWS) to retrieve keywords from an external database to form a restricted biasing list. Heavy dependence on text-to-speech (TTS) models for generating the speech for the keywords makes the system prone to the drawbacks of using TTS models to generate speech for graphemes with non-trivial phonetic transcriptions. This work proposes an extension to CB-Whisper that leverages user feedback to extend the database of keywords with audio extracted from natural speech. We experiment with different learning strategies for the OV-KWS classifier to assess its domain generalization capabilities for TTS-generated or natural-speech keyword audios and unseen languages.