naseem25@interspeech_2025@ISCA

Total: 1

#1 Developing High-Quality TTS for Punjabi and Urdu: Benchmarking against MMS Models [PDF1] [Copy] [Kimi] [REL]

Authors: Fatima Naseem, Maham Sajid, Farah Adeeba, Sahar Rauf, Asad Mustafa, Sarmad Hussain

Existing Punjabi text-to-speech (TTS) solutions focus on Gurumukhi script, requiring transliteration from Shahmukhi. This leads to letter substitutions and omissions, resulting in pronunciation errors. In this study, speech corpus, phonetic lexicon, and text analysis module for Punjabi Shahmukhi were developed. Two model architectures: Tacotron 1 and Tacotron 2 with WaveGlow were used to build TTS models. In addition to Punjabi, Urdu TTS models were also developed. These models were benchmarked against Urdu and Punjabi Gurumukhi TTS models provided by Meta’s Massively Multilingual Speech (MMS) which is a top profile multilingual speech project. Objective and subjective evaluations indicate that tacotron based Urdu and Punjabi models outperform MMS in intelligibility, naturalness, and phonetic accuracy, enhancing TTS quality for these languages.

Subject: INTERSPEECH.2025 - Speech Synthesis