SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification

#1 SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification [PDF] [Copy] [Kimi¹] [REL]

Self-Supervised Learning (SSL) has led to considerable progress in Speaker Verification (SV). The standard framework uses same-utterance positive sampling and data-augmentation to generate anchor-positive pairs of the same speaker. This is a major limitation, as this strategy primarily encodes channel information from the recording condition, shared by the anchor and positive. We propose a new positive sampling technique to address this bottleneck: Self-Supervised Positive Sampling (SSPS). For a given anchor, SSPS aims to find an appropriate positive, i.e., of the same speaker identity but a different recording condition, in the latent space using clustering assignments and a memory queue of positive embeddings. SSPS improves SV performance for both SimCLR and DINO, reaching 2.57% and 2.53% EER, outperforming SOTA SSL methods on VoxCeleb1-O. In particular, SimCLR-SSPS achieves a 58% EER reduction by lowering intra-speaker variance, providing comparable performance to DINO-SSPS.

Subject: INTERSPEECH.2025 - Speech Detection

lepage25@interspeech_2025@ISCA

#1 SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification [PDF] [Copy] [Kimi1] [REL]

#1 SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification [PDF] [Copy] [Kimi¹] [REL]