Total: 1
Speech audiometry assesses hearing disorders, typically relies on audiologists, making the process subjective and requiring in-person evaluation. In this paper, we introduce SylPh, a novel automatic syllable-level mispronunciation detection and diagnosis (MDD) model that generalizes across open-set syllables while also offering phonemic analysis. To capture a wide range of mispronunciation patterns, we construct positive and pseudo-negative bags to extract in-distribution and out-of-distribution features from input audio. Our model aligns audio features with adaptive text embeddings using a contrastive objective, dynamically adjusting decision boundaries for each syllable within a single model. Extensive experiments on a large-scale dataset demonstrate its effectiveness in both closed-set and open-set syllables. Notably, despite training only on syllable-level labels, the Sylph has the capability to localize phoneme-level abnormalities, providing detailed diagnostic insights.