LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models

#1 LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models [PDF] [Copy] [Kimi] [REL]

Authors: Hieu Tran, Junda Wang, Yujan Ting, Hong Yu, Weijing Huang, Terrence Chen

Large language models (LLMs) often struggle with factual accuracy in knowledge-intensive domains like healthcare. We introduce LEAF (Learning and Evaluation Augmented by Fact-Checking), a framework for improving LLM factuality in medical question answering. LEAF comprises three components: (1) RAFE, a robust fact-checking system using open-source LLMs and domain-specific retrieval to evaluate response accuracy; (2) Fact-Check-then-RAG, which leverages fact-checking results to guide retrieval without parameter updates; and (3) Learning from Fact Check, enabling self-training through supervised fine-tuning or preference-based learning using fact-checking as pseudo-labels. Experimental results show that RAFE outperforms Factcheck-GPT in detecting inaccuracies, Fact-Check-then-RAG effectively corrects errors, and Learning from Fact Check improves performance without labeled data. In a real-world healthcare deployment with proprietary medical documents, LEAF achieved an 83% improvement in factuality scores, demonstrating practical applicability for adapting general-purpose LLMs to organization-specific knowledge. Our framework provides a scalable solution for industrial applications requiring high factual accuracy.

Subject: EMNLP.2025 - Industry Track

2025.emnlp-industry.23@ACL

#1 LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models [PDF] [Copy] [Kimi] [REL]