Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering

#1 Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Jinghua Zhao, Hang Su, Lichun Fan, Zhenbo Luo, Hui Wang, Haoqin Sun, Yong Qin

With the rapid progress of large audio-language models (LALMs), audio question answering (AQA) has emerged as a challenging task requiring both fine-grained audio understanding and complex reasoning. While current methods mainly rely on constructing new datasets via captioning or reasoning traces, existing high-quality AQA data remains underutilized. To address this, we propose Omni-CLST, an error-aware Curriculum Learning framework with guided Selective Chain-of-Thought. The framework efficiently leverages existing high-quality dataset through two key strategies: an error-aware curriculum that organizes samples by difficulty, and a guided thought dropout mechanism that focuses reasoning on challenging cases. Experiments show that Omni-CLST achieves 73.80% on MMAU-mini and a new state of the art of 64.30% on MMAR, demonstrating robust generalization in multimodal audio-language understanding.

Subjects: Sound , Artificial Intelligence , Audio and Speech Processing

Publish: 2025-09-14 06:54:12 UTC

2509.12275

#1 Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering [PDF1] [Copy] [Kimi1] [REL]

#1 Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering [PDF¹] [Copy] [Kimi¹] [REL]