Objective Metrics for Evaluating Large Language Models Using External Data Sources

#1 Objective Metrics for Evaluating Large Language Models Using External Data Sources [PDF] [Copy] [Kimi] [REL]

Authors: Haoze Du, Richard Li, Edward Gehringer

Evaluating the performance of Large Language Models (LLMs) is a critical yet challenging task, particularly when aiming to avoid subjective assessments. This paper proposes a framework for leveraging subjective metrics derived from the class textual materials across different semesters to assess LLM outputs across various tasks. By utilizing well-defined benchmarks, factual datasets, and structured evaluation pipelines, the approach ensures consistent, reproducible, and bias-minimized measurements. The framework emphasizes automation and transparency in scoring, reducing reliance on human interpretation while ensuring alignment with real-world applications. This method addresses the limitations of subjective evaluation methods, providing a scalable solution for performance assessment in educational, scientific, and other high-stakes domains.

Subjects: Computation and Language , Machine Learning

Publish: 2025-08-01 02:24:19 UTC

2508.08277

#1 Objective Metrics for Evaluating Large Language Models Using External Data Sources [PDF] [Copy] [Kimi] [REL]