2603.09725

Total: 1

#1 A Semi-spontaneous Dutch Speech Dataset for Speech Enhancement and Speech Recognition [PDF1] [Copy] [Kimi] [REL]

Authors: Dimme de Groot, Yuanyuan Zhang, Jorge Martinez, Odette Scharenborg

We present DRES: a 1.5-hour Dutch realistic elicited (semi-spontaneous) speech dataset from 80 speakers recorded in noisy, public indoor environments. DRES was designed as a test set for the evaluation of state-of-the-art (SOTA) automatic speech recognition (ASR) and speech enhancement (SE) models in a real-world scenario: a person speaking in a public indoor space with background talkers and noise. The speech was recorded with a four-channel linear microphone array. In this work we evaluate the speech quality of five well-known single-channel SE algorithms and the recognition performance of eight SOTA off-the-shelf ASR models before and after applying SE on the speech of DRES. We found that five out of the eight ASR models have WERs lower than 22% on DRES, despite the challenging conditions. In contrast to recent work, we did not find a positive effect of modern single-channel SE on ASR performance, emphasizing the importance of evaluating in realistic conditions.

Subject: Audio and Speech Processing

Publish: 2026-03-10 14:32:12 UTC