saget24@interspeech_2024@ISCA

Total: 1

#1 Lifelong Learning MOS Prediction for Synthetic Speech Quality Evaluation [PDF] [Copy] [Kimi] [REL]

Authors: Félix Saget, Meysam Shamsi, Marie Tahon

Mean Opinion Score (MOS) has been a long-standing standard for perceptive evaluation of quality of speech synthesis models; however, this criterion is hardly reproducible, and costly. Automatic, neural MOS predictors have emerged as a solution to the objective assessment of synthetic speech. These predictors are trained once on data collected from past listening tests, and thus may suffer from adaptation to new technology breakthrough in speech synthesis. In this study, we investigate the applicability of lifelong learning for MOS predictors, where the training samples would be fed to the model in the chronological order. A sequential lifelong mode and a cumulative lifelong mode have been compared with traditional batch training using the BVCC and Blizzard Challenge datasets. The experiments show the advantages of lifelong learning in cross-corpus evaluation as well as in a constrained data availability scenario.

Subject: INTERSPEECH.2024 - Speech Synthesis