EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS

#1 EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS [PDF²] [Copy] [Kimi¹] [REL]

Authors: Haoxun Li, Yu Liu, Yuqing Sun, Hanlei Shi, Leyuan Qu, Taihao Li

Recent LLM-based TTS systems achieve strong quality and zero-shot ability, but lack fine-grained emotional control due to their reliance on discrete speech tokens. Existing approaches either limit emotions to categorical labels or cannot generalize to LLM-based architectures. We propose EMORL-TTS (Fine-grained Emotion-controllable TTS with Reinforcement Learning), a framework that unifies global intensity control in the VAD space with local emphasis regulation. Our method combines supervised fine-tuning with reinforcement learning guided by task-specific rewards for emotion category, intensity, and emphasis. Moreover, we further investigate how emphasis placement modulates fine-grained emotion intensity. Experiments show that EMORL-TTS improves emotion accuracy, intensity differentiation, and emphasis clarity, while preserving synthesis quality comparable to strong LLM-based baselines.

Subject: Sound

Publish: 2025-10-07 10:24:12 UTC

2510.05758

#1 EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS [PDF2] [Copy] [Kimi1] [REL]

#1 EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS [PDF²] [Copy] [Kimi¹] [REL]