Select-then-Route : Taxonomy guided Routing for LLMs

#1 Select-then-Route : Taxonomy guided Routing for LLMs [PDF] [Copy] [Kimi] [REL]

Recent advances in large language models (LLMs) have boosted performance across a broad spectrum of natural‐language tasks, yet no single model excels uniformly across domains. Sending each query to the most suitable model mitigates this limitation, but deciding among *all* available LLMs for each query is prohibitively expensive. Both the accuracy and the latency can improve if the decision space for the model choice is first narrowed, followed by selecting the suitable model for the given query.We introduce Select-then-Route (StR), a two‐stage framework that first *selects* a small, task‐appropriate pool of LLMs and then *routes* each query within that pool through an adaptive cascade. StR first employs a lightweight, *taxonomy‐guided selector* that maps each query to models proven proficient for its semantic class (e.g., reasoning, code, summarisation). Within the selected pool, a *confidence‐based cascade* begins with the cheapest model and escalates only when a multi‐judge agreement test signals low reliability.Across six public benchmarks of various domains, StR improves the end‐to‐end accuracy from 91.7% (best single model) to 94.3% while reducing inference cost by 4X. Because both the taxonomy and multi-judge evaluation thresholds are tunable, StR exposes a smooth cost–accuracy frontier, enabling users to dial in the trade‐off that best fits their latency and budget constraints.

Subject: EMNLP.2025 - Industry Track

2025.emnlp-industry.28@ACL

#1 Select-then-Route : Taxonomy guided Routing for LLMs [PDF] [Copy] [Kimi] [REL]