gulzar25@interspeech_2025@ISCA

Total: 1

#1 Leveraging LLMs for Written to Spoken Style Data Transformation to Enhance Spoken Dialog State Tracking [PDF] [Copy] [Kimi] [REL]

Authors: Haris Gulzar, Monikka Roslianna Busto, Akiko Masaki, Takeharu Eda, Ryo Masumura

Dialog State Tracking (DST) is an important part of Task-Oriented Dialog (TOD) systems, as it needs to navigate the complex human conversational flow to accomplish a task. Most TOD systems are trained on written-style text data, and their performance plunges when deployed in spoken scenarios due to natural disfluencies and human-speech recognition errors. Labeled spoken-style TOD data is limited because of the high data collection cost and privacy concerns. As Large Language Models (LLMs) emerge as a tool for synthetic text data generation, we explored their capability to generate spoken-style text-based TOD data. Through meticulously crafting LLM prompts, our generated labeled spoken style TOD data improved the absolute Joint Goal Accuracy (JGA) by 3.39% and relative JGA by 11.6%, for dedicated DST models. In this work, we showcase our divide-and-conquer-based data generation strategies and DST training to improve the performance of task-specific dialog models.

Subject: INTERSPEECH.2025 - Speech Synthesis