Check-Eval: A Checklist-based Approach for Evaluating Text Quality

#1 Check-Eval: A Checklist-based Approach for Evaluating Text Quality [PDF⁸] [Copy] [Kimi¹⁸] [REL]

Authors: Jayr Pereira, Roberto Lotufo

Evaluating the quality of text generated by large language models (LLMs) remains a significant challenge. Traditional metrics often fail to align well with human judgments, particularly in tasks requiring creativity and nuance. In this paper, we propose Check-Eval, a novel evaluation framework leveraging LLMs to assess the quality of generated text through a checklist-based approach. Check-Eval can be employed as both a reference-free and reference-dependent evaluation method, providing a structured and interpretable assessment of text quality. The framework consists of two main stages: checklist generation and checklist evaluation. We validate Check-Eval on two benchmark datasets: Portuguese Legal Semantic Textual Similarity and SummEval. Our results demonstrate that Check-Eval achieves higher correlations with human judgments compared to existing metrics, such as G-Eval and GPTScore, underscoring its potential as a more reliable and effective evaluation framework for natural language generation tasks. The code for our experiments is available at https://anonymous.4open.science/r/check-eval-0DB4.

Subjects: Computation and Language , Artificial Intelligence

Publish: 2024-07-19 17:14:16 UTC

2407.14467

#1 Check-Eval: A Checklist-based Approach for Evaluating Text Quality [PDF8] [Copy] [Kimi18] [REL]

#1 Check-Eval: A Checklist-based Approach for Evaluating Text Quality [PDF⁸] [Copy] [Kimi¹⁸] [REL]