Total: 1
Speech enhancement models have traditionally relied on VoiceBank-DEMAND for training and evaluation. However, this dataset presents significant limitations due to its limited diversity and simulated noise conditions. As an alternative, we propose and demonstrate the usefulness of evaluating the generalization capabilities of recent speech enhancement models using CommonPhone, a multilingual and crowdsourced dataset. Since CommonPhone is derived from CommonVoice, it allows to analyze enhancement performance based on demographic variables such as age and gender. Our experiments reveal significant performance variations across these variables. We also introduce a new benchmark dataset designed to challenge enhancement models with difficult and diverse speech samples, facilitating future research in universal speech enhancement.