2509.25458

Total: 1

#1 Plug-and-Play Emotion Graphs for Compositional Prompting in Zero-Shot Speech Emotion Recognition [PDF] [Copy] [Kimi] [REL]

Authors: Jiacheng Shi, Hongfei Du, Y. Alicia Hong, Ye Gao

Large audio-language models (LALMs) exhibit strong zero-shot performance across speech tasks but struggle with speech emotion recognition (SER) due to weak paralinguistic modeling and limited cross-modal reasoning. We propose Compositional Chain-of-Thought Prompting for Emotion Reasoning (CCoT-Emo), a framework that introduces structured Emotion Graphs (EGs) to guide LALMs in emotion inference without fine-tuning. Each EG encodes seven acoustic features (e.g., pitch, speech rate, jitter, shimmer), textual sentiment, keywords, and cross-modal associations. Embedded into prompts, EGs provide interpretable and compositional representations that enhance LALM reasoning. Experiments across SER benchmarks show that CCoT-Emo outperforms prior SOTA and improves accuracy over zero-shot baselines.

Subject: Artificial Intelligence

Publish: 2025-09-29 20:06:03 UTC