HPP-Voice: A Large-Scale Evaluation of Speech Embeddings for Multi-Phenotypic Classification

#1 HPP-Voice: A Large-Scale Evaluation of Speech Embeddings for Multi-Phenotypic Classification [PDF¹] [Copy] [Kimi] [REL]

Authors: David Krongauz, Hido Pinto, Sarah Kohn, Yanir Marmor, Eran Segal

Human speech contains paralinguistic cues that reflect a speaker's physiological and neurological state, potentially enabling non-invasive detection of various medical phenotypes. We introduce the Human Phenotype Project Voice corpus (HPP-Voice): a dataset of 7,188 recordings in which Hebrew-speaking adults count for 30 seconds, with each speaker linked to up to 15 potentially voice-related phenotypes spanning respiratory, sleep, mental health, metabolic, immune, and neurological conditions. We present a systematic comparison of 14 modern speech embedding models, where modern speech embeddings from these 30-second counting tasks outperform MFCCs and demographics for downstream health condition classifications. We found that embedding learned from a speaker identification model can predict objectively measured moderate to severe sleep apnea in males with an AUC of 0.64 $\pm$ 0.03, while MFCC and demographic features led to AUCs of 0.56 $\pm$ 0.02 and 0.57 $\pm$ 0.02, respectively. Additionally, our results reveal gender-specific patterns in model effectiveness across different medical domains. For males, speaker identification and diarization models consistently outperformed speech foundation models for respiratory conditions (e.g., asthma: 0.61 $\pm$ 0.03 vs. 0.56 $\pm$ 0.02) and sleep-related conditions (insomnia: 0.65 $\pm$ 0.04 vs. 0.59 $\pm$ 0.05). For females, speaker diarization models performed best for smoking status (0.61 $\pm$ 0.02 vs 0.55 $\pm$ 0.02), while Hebrew-specific models performed best (0.59 $\pm$ 0.02 vs. 0.58 $\pm$ 0.02) in classifying anxiety compared to speech foundation models. Our findings provide evidence that a simple counting task can support large-scale, multi-phenotypic voice screening and highlight which embedding families generalize best to specific conditions, insights that can guide future vocal biomarker research and clinical deployment.

Subject: Audio and Speech Processing

Publish: 2025-05-22 10:22:15 UTC

2505.16490

#1 HPP-Voice: A Large-Scale Evaluation of Speech Embeddings for Multi-Phenotypic Classification [PDF1] [Copy] [Kimi] [REL]

#1 HPP-Voice: A Large-Scale Evaluation of Speech Embeddings for Multi-Phenotypic Classification [PDF¹] [Copy] [Kimi] [REL]