2025.findings-emnlp.1377@ACL

Total: 1

#1 Diverse Multi-tool Aggregation with Large Language Models for Enhanced Math Reasoning [PDF] [Copy] [Kimi] [REL]

Authors: Bohan Yao, Vikas Yadav

Tool usage is a proven technique for developing high-performance reasoning in large language models (LLMs). Our work is focused on emphasizing the utility of leveraging multiple diverse tools for complex reasoning tasks. We present Multi-TAG, a Multi-Tool AGgregation-based LLM framework that utilizes multiple diverse tools to solve complex math problems over multiple reasoning steps. At each reasoning step, Multi-TAG invokes multiple tools and accepts the solution of the respective step by tools that have majority agreement on the final answer estimate. Multi-TAG strongly outperforms several standard baselines that use individual tools with the same number of runs, highlighting the importance of multi-tool invocation for solving complex reasoning tasks. We also show that naive aggregation of multiple tools at each reasoning step also leads to substantial improvements of up to 35% accuracy. Multi-TAG then further improves these gains by 7.4% on average on MATH500, AIME, AMC, and OlympiadBench.

Subject: EMNLP.2025 - Findings