Evaluating and Rewarding LALMs for Expressive Role-Play TTS via Mean Continuation Log-Probability

#1 Evaluating and Rewarding LALMs for Expressive Role-Play TTS via Mean Continuation Log-Probability [PDF²] [Copy] [Kimi] [REL]

Authors: Yong Ren, Jingbei Li, Haiyang Sun, Yujie Chen, Cheng Yi, Yechang Huang, Hao Gu, Ye Bai, Xuerui Yang

Recent advances in Large Audio Language Models (LALMs) have extended Text-to-Speech (TTS) to interactive role-play scenarios, which demand high expressiveness and strict adherence to role-play instructions. However, existing models struggle to maintain stylistic consistency with character profiles and scene descriptions across multi-turn dialogues. A critical bottleneck is the lack of objective metrics for quantifying speaking style. To bridge this gap, we propose Mean Continuation Log-Probability (MCLP) as both an evaluation metric and a reward signal, validated on LALM-based Role-Play TTS (RP-TTS) tasks. Critically, we leverage the In-Context Learning capability of pre-trained LALMs to formulate MCLP via a continuation log-probability prediction. This metric quantifies stylistic consistency by measuring the likelihood of the ground-truth speech conditioned on the generated speech. Furthermore, we employ MCLP as a reinforcement learning reward to enhance the style alignment between generated speech and Role-Play instructions. To facilitate evaluation, we construct an RP-TTS dataset with rich scene and character annotations. Experimental results demonstrate that our method significantly outperforms strong LALM baselines on both objective and subjective metrics.

Subject: Sound

Publish: 2026-01-30 07:27:48 UTC

2601.22661

#1 Evaluating and Rewarding LALMs for Expressive Role-Play TTS via Mean Continuation Log-Probability [PDF2] [Copy] [Kimi] [REL]

#1 Evaluating and Rewarding LALMs for Expressive Role-Play TTS via Mean Continuation Log-Probability [PDF²] [Copy] [Kimi] [REL]