TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications

#1 TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications [PDF] [Copy] [Kimi] [REL]

Authors: Sunwoo Lee, Daseong Jang, Dhammiko Arya, Gyoung-eun Han, Injee Song, SaeRom Kim, Sangjin Kim, Seojin Lee, Seokyoung Hong, Sereimony Sek, Seung-Mo Cho, Sohee Park, Sungbin Yoon, Wonbeom Jang, Eric Davis

As Large Language Models (LLMs) evolve into powerful agentic systems, the telecommunications industry’s expansion into AI services necessitates industry-grounded benchmarks to evaluate their underexplored domain-specific capabilities. To address the gap left by generic benchmarks that fail to assess realistic, non-English performance, we present TelAgentBench, a Korean benchmark for the telecommunications domain evaluating five core agentic capabilities: Reasoning, Planning, Action (tool-use), Retrieval-Augmented Generation, and Instruction Following. Evaluations reveal significant performance disparities between models that employ explicit reasoning and those that do not, providing actionable insights for deploying agentic LLMs in real-world telecommunications tasks.

Subject: EMNLP.2025 - Industry Track

2025.emnlp-industry.83@ACL

#1 TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications [PDF] [Copy] [Kimi] [REL]