Improving Model Factuality with Fine-grained Critique-based Evaluator

#1 Improving Model Factuality with Fine-grained Critique-based Evaluator [PDF²] [Copy] [Kimi] [REL]

Authors: Yiqing Xie ; Wenxuan Zhou ; Pradyot Prakash ; Di Jin ; Yuning Mao ; Quintin Fettes ; Arya Talebzadeh ; Sinong Wang ; Han Fang ; Carolyn Rose ; Daniel Fried ; Hejia Zhang

Factuality evaluation aims to detect factual errors produced by language models (LMs) and hence guide the development of more factual models. Towards this goal, we train a factuality evaluator, FenCE, that provides LM generators with claim-level factuality feedback. We conduct data augmentation on a combination of public judgment datasets to train FenCE to (1) generate textual critiques along with scores and (2) make claim-level judgment based on diverse source documents obtained by various tools. We then present a framework that leverages FenCE to improve the factuality of LM generators by constructing training data. Specifically, we generate a set of candidate responses, leverage FenCE to revise and score each response without introducing lesser-known facts, and train the generator by preferring highly scored revised responses. Experiments show that our data augmentation methods improve the evaluator's accuracy by 2.9% on LLM-AggreFact. With FenCE, we improve Llama3-8B-chat's factuality rate by 14.45% on FActScore, outperforming state-of-the-art factuality finetuning methods by 6.96%.

Subject: Computation and Language

Publish: 2024-10-24 01:41:02 UTC

2410.18359

#1 Improving Model Factuality with Fine-grained Critique-based Evaluator [PDF2] [Copy] [Kimi] [REL]

#1 Improving Model Factuality with Fine-grained Critique-based Evaluator [PDF²] [Copy] [Kimi] [REL]