EChO-Agent: Evidence Chain Orchestration Agent for Audio Reasoning

#1 EChO-Agent: Evidence Chain Orchestration Agent for Audio Reasoning [PDF] [Copy] [Kimi] [REL]

Authors: Siyuan Zhang, Jian Zong, Junyu Wang, Peiyuan Jiang, Jiahao Yan, Jingyu Zhang, Tianrui Wang, Xiaobao Wang, Longbiao Wang, Jianwu Dang

While LALMs show promise on audio question answering, they fail to focus on question-relevant segments of audio and provide a clear, checkable reasoning process when dealing with complex audio reasoning. Reinforcement learning and tool-augmented prompting can help models better relate questions to audio but lack a reliable way to understand, integrate, and self-verify audio segments. To address this gap, we present EChO-Agent, a modular agent framework that reformulates complex audio QA as a planning, tool execution, evidence integration, and answer verification workflow. Experiments on MMAR benchmark show EChO-Agent improves both accuracy and rubric scores over baseline and ablation studies show evidence integration is the key factor.

Subjects: Audio and Speech Processing , Artificial Intelligence , Sound

Publish: 2026-06-13 06:05:59 UTC

2606.15141

#1 EChO-Agent: Evidence Chain Orchestration Agent for Audio Reasoning [PDF] [Copy] [Kimi] [REL]