WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning

#1 WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning [PDF¹] [Copy] [Kimi⁹]

Authors: Wenhao Wu ; Wei Li ; Xinyan Xiao ; Jiachen Liu ; Sujian Li ; Yajuan Lyu

A crucial issue of current text generation models is that they often uncontrollably generate text that is factually inconsistent with inputs. Due to lack of annotated data, existing factual consistency metrics usually train evaluation models on synthetic texts or directly transfer from other related tasks, such as question answering (QA) and natural language inference (NLI).Bias in synthetic text or upstream tasks makes them perform poorly on text actually generated by language models, especially for general evaluation for various tasks. To alleviate this problem, we propose a weakly supervised framework named WeCheck that is directly trained on actual generated samples from language models with weakly annotated labels.WeCheck first utilizes a generative model to infer the factual labels of generated samples by aggregating weak labels from multiple resources.Next, we train a simple noise-aware classification model as the target metric using the inferred weakly supervised information.Comprehensive experiments on various tasks demonstrate the strong performance of WeCheck, achieving an average absolute improvement of 3.3% on the TRUE benchmark over 11B state-of-the-art methods using only 435M parameters.Furthermore, it is up to 30 times faster than previous evaluation methods, greatly improving the accuracy and efficiency of factual consistency evaluation.

2023.acl-long.18@ACL

#1 WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning [PDF1] [Copy] [Kimi9]

#1 WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning [PDF¹] [Copy] [Kimi⁹]