Improving X-Vector and PLDA for Text-Dependent Speaker Verification

#1 Improving X-Vector and PLDA for Text-Dependent Speaker Verification [PDF] [Copy] [Kimi¹] [REL]

Recently, the pipeline consisting of an x-vector speaker embedding front-end and a Probabilistic Linear Discriminant Analysis (PLDA) back-end has achieved state-of-the-art results in text-independent speaker verification. In this paper, we further improve the performance of x-vector and PLDA based system for text-dependent speaker verification by exploring the choice of layer to produce embedding and modifying the back-end training strategies. In particular, we probe that x-vector based embeddings, specifically the standard deviation statistics in the pooling layer, contain the information related to both speaker characteristics and spoken content. Accordingly, we modify the back-end training labels by utilizing both of the speaker-id and phrase-id. A correlation-alignment-based PLDA adaptation is also adopted to make use of the text-independent labeled data during back-end training. Experimental results on the SDSVC 2020 dataset show that our proposed methods achieve significant performance improvement compared with the x-vector and HMM based i-vector baselines.

Subject: INTERSPEECH.2020 - Analysis and Assessment

chen20d@interspeech_2020@ISCA

#1 Improving X-Vector and PLDA for Text-Dependent Speaker Verification [PDF] [Copy] [Kimi1] [REL]

#1 Improving X-Vector and PLDA for Text-Dependent Speaker Verification [PDF] [Copy] [Kimi¹] [REL]