DAME: Duration-Aware Matryoshka Embedding for Duration-Robust Speaker Verification

#1 DAME: Duration-Aware Matryoshka Embedding for Duration-Robust Speaker Verification [PDF¹] [Copy] [Kimi] [REL]

Authors: Youngmoon Jung, Joon-Young Yang, Ju-ho Kim, Jaeyoung Roh, Chang Woo Han, Hoon-Young Cho

Short-utterance speaker verification remains challenging due to limited speaker-discriminative cues in short speech segments. While existing methods focus on enhancing speaker encoders, the embedding learning strategy still forces a single fixed-dimensional representation reused for utterances of any length, leaving capacity misaligned with the information available at different durations. We propose Duration-Aware Matryoshka Embedding (DAME), a model-agnostic framework that builds a nested hierarchy of sub-embeddings aligned to utterance durations: lower-dimensional representations capture compact speaker traits from short utterances, while higher dimensions encode richer details from longer speech. DAME supports both training from scratch and fine-tuning, and serves as a direct alternative to conventional large-margin fine-tuning, consistently improving performance across durations. On the VoxCeleb1-O/E/H and VOiCES evaluation sets, DAME consistently reduces the equal error rate on 1-s and other short-duration trials, while maintaining full-length performance with no additional inference cost. These gains generalize across various speaker encoder architectures under both general training and fine-tuning setups.

Subjects: Audio and Speech Processing , Artificial Intelligence

Publish: 2026-01-20 14:20:44 UTC

2601.13999

#1 DAME: Duration-Aware Matryoshka Embedding for Duration-Robust Speaker Verification [PDF1] [Copy] [Kimi] [REL]

#1 DAME: Duration-Aware Matryoshka Embedding for Duration-Robust Speaker Verification [PDF¹] [Copy] [Kimi] [REL]