ruida22@interspeech_2022@ISCA

Total: 1

#1 Adaptive Rectangle Loss for Speaker Verification [PDF] [Copy] [Kimi2]

Authors: Li Ruida ; Fang Shuo ; Ma Chenguang ; Li Liang

From the perspective of pair similarity optimization, speaker verification is expected to satisfy the criterion that each intraclass similarity is higher than the maximal inter-class similarity. However, we find that most softmax-based losses are suboptimal which encourages each sample to have a higher target similarity score only than its corresponding non-target similarity scores but not all the non-target ones. To this end, we propose a batch-wise maximum softmax loss, in which the non-target logits are replaced by the ones derived from the whole batch. To further emphasize the minority hard non-target pairs, an adaptive margin mechanism is introduced at the same time. The proposed loss is named Adaptive Rectangle loss due to its rectangle decision boundary. In addition, an annealing strategy is introduced to improve the stability of the training process and boost the convergence. Experimentally, we demonstrate the superiority of adaptive rectangle loss on speaker verification tasks. Results on VoxCeleb show that our proposed loss outperforms state-of-the-art by 10.11% in EER.