DispatchQA: A Benchmark for Small Function Calling Language Models in E-Commerce Applications

2025.emnlp-industry.154@ACL

Total: 1

#1 DispatchQA: A Benchmark for Small Function Calling Language Models in E-Commerce Applications [PDF] [Copy] [Kimi] [REL]

Authors: Joachim Daiber, Victor Maricato, Ayan Sinha, Andrew Rabinovich

We introduce DispatchQA, a benchmark to evaluate how well small language models (SLMs) translate open‐ended search queries into executable API calls via explicit function calling. Our benchmark focuses on the latency-sensitive e-commerce setting and measures SLMs’ impact on both search relevance and search latency. We provide strong, replicable baselines based on Llama 3.1 8B Instruct fine-tuned on synthetically generated data and find that fine-tuned SLMs produce search quality comparable or better than large language models such as GPT-4o while achieving up to 3× faster inference. All data, code, and training checkpoints are publicly released to spur further research on resource‐efficient query understanding.

Subject: EMNLP.2025 - Industry Track