2505.04237

Total: 1

#1 Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement [PDF2] [Copy] [Kimi1] [REL]

Authors: Rauf Nasretdinov, Roman Korostik, Ante Jukić

In this work, we investigate application of generative speech enhancement to improve the robustness of ASR models in noisy and reverberant conditions. We employ a recently-proposed speech enhancement model based on Schrödinger bridge, which has been shown to perform well compared to diffusion-based approaches. We analyze the impact of model scaling and different sampling methods on the ASR performance. Furthermore, we compare the considered model with predictive and diffusion-based baselines and analyze the speech recognition performance when using different pre-trained ASR models. The proposed approach significantly reduces the word error rate, reducing it by approximately 40% relative to the unprocessed speech signals and by approximately 8% relative to a similarly sized predictive approach.

Subject: Audio and Speech Processing

Publish: 2025-05-07 08:40:50 UTC