AgentArch: A Comprehensive Benchmark to Evaluate Agent Architectures in Enterprise

#1 AgentArch: A Comprehensive Benchmark to Evaluate Agent Architectures in Enterprise [PDF³] [Copy] [Kimi⁶] [REL]

Authors: Tara Bogavelli, Roshnee Sharma, Hari Subramani

While individual components of agentic architectures have been studied in isolation, there remains limited empirical understanding of how different design dimensions interact within complex multi-agent systems. This study aims to address these gaps by providing a comprehensive enterprise-specific benchmark evaluating 18 distinct agentic configurations across state-of-the-art large language models. We examine four critical agentic system dimensions: orchestration strategy, agent prompt implementation (ReAct versus function calling), memory architecture, and thinking tool integration. Our benchmark reveals significant model-specific architectural preferences that challenge the prevalent one-size-fits-all paradigm in agentic AI systems. It also reveals significant weaknesses in overall agentic performance on enterprise tasks with the highest scoring models achieving a maximum of only 35.3\% success on the more complex task and 70.8\% on the simpler task. We hope these findings inform the design of future agentic systems by enabling more empirically backed decisions regarding architectural components and model selection.

Subjects: Artificial Intelligence , Computation and Language , Multiagent Systems

Publish: 2025-09-13 01:18:23 UTC

2509.10769

#1 AgentArch: A Comprehensive Benchmark to Evaluate Agent Architectures in Enterprise [PDF3] [Copy] [Kimi6] [REL]

#1 AgentArch: A Comprehensive Benchmark to Evaluate Agent Architectures in Enterprise [PDF³] [Copy] [Kimi⁶] [REL]