CABENCH: Benchmarking Composable AI for Solving Complex Tasks through Composing Ready-to-Use Models

#1 CABENCH: Benchmarking Composable AI for Solving Complex Tasks through Composing Ready-to-Use Models [PDF¹] [Copy] [Kimi²] [REL]

Authors: Tung-Thuy Pham, Duy-Quan Luong, Minh-Quan Duong, Trung-Hieu Nguyen, Thu-Trang Nguyen, Son Nguyen, Hieu Dinh Vo

Composable AI offers a scalable and effective paradigm for tackling complex AI tasks by decomposing them into sub-tasks and solving each sub-task using ready-to-use well-trained models. However, systematically evaluating methods under this setting remains largely unexplored. In this paper, we introduce CABENCH, the first public benchmark comprising 70 realistic composable AI tasks, along with a curated pool of 700 models across multiple modalities and domains. We also propose an evaluation framework to enable end-to-end assessment of composable AI solutions. To establish initial baselines, we provide human-designed reference solutions and compare their performance with two LLM-based approaches. Our results illustrate the promise of composable AI in addressing complex real-world problems while highlighting the need for methods that can fully unlock its potential by automatically generating effective execution pipelines.

Subjects: Artificial Intelligence , Software Engineering

Publish: 2025-08-04 13:48:32 UTC

2508.02427

#1 CABENCH: Benchmarking Composable AI for Solving Complex Tasks through Composing Ready-to-Use Models [PDF1] [Copy] [Kimi2] [REL]

#1 CABENCH: Benchmarking Composable AI for Solving Complex Tasks through Composing Ready-to-Use Models [PDF¹] [Copy] [Kimi²] [REL]