siegert25@interspeech_2025@ISCA

Total: 1

#1 Queer Waves: A German Speech Dataset Capturing Gender and Sexual Diversity from Podcasts and YouTube [PDF] [Copy] [Kimi] [REL]

Authors: Ingo Siegert, Jan Marquenie, Sven Grawunder

Developing equitable and inclusive speech technologies requires datasets that represent the full spectrum of human voices, including those of LGBTQIA+ speakers. However, capturing spontaneous, high-quality audio from marginalized gender and sexual identities presents significant ethical, logistical, and representational challenges. This paper introduces Queer Waves, a German speech corpus compiled from podcast and YouTube content featuring self-identified LGBTQIA+ speakers, with a particular focus on diverse gender identities and sexual orientations. We further address the legal and ethical considerations inherent in collecting sensitive personal data. The Queer Waves corpus comprises approximately 335 hours of speech from over 400 self-identified LGBTQIA+ speakers, spanning ages from 18 to 86 years. By expanding representation across a wide range of gender identities and orientations, Queer Waves aims to advance the development of fairer and more accurate speech technologies.

Subject: INTERSPEECH.2025 - Modelling and Learning