Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models

#1 Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models [PDF²] [Copy] [Kimi¹] [REL]

Authors: Longhao Li, Hongjie Chen, Zehan Li, Qihan Hu, Jian Kang, Jie Li, Lei Xie, Yongxiang Li

Recent advances in reasoning models have driven significant progress in text and multimodal domains, yet audio reasoning remains relatively limited. Only a few Large Audio Language Models (LALMs) incorporate explicit Chain-of-Thought (CoT) reasoning, and their capabilities are often inconsistent and insufficient for complex tasks. To bridge this gap, we introduce Audio-Cogito, a fully open-source solution for deep audio reasoning. We develop Cogito-pipe for high-quality audio reasoning data curation, producing 545k reasoning samples that will be released after review. Based on this dataset, we adopt a self-distillation strategy for model fine-tuning. Experiments on the MMAR benchmark, the only audio benchmark evaluating the CoT process, show that our model achieves the best performance among open-source models and matches or surpasses certain closed-source models in specific metrics. Our approach also ranks among the top-tier systems in the Interspeech 2026 Audio Reasoning Challenge.

Subject: Audio and Speech Processing

Publish: 2026-04-14 10:00:39 UTC

2604.12527

#1 Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models [PDF2] [Copy] [Kimi1] [REL]

#1 Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models [PDF²] [Copy] [Kimi¹] [REL]