KGGen: Extracting Knowledge Graphs from Plain Text with Language Models

#1 KGGen: Extracting Knowledge Graphs from Plain Text with Language Models [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Belinda Mo, Kyssen Yu, Joshua Kazdan, Proud Mpala, Lisa Yu, Charilaos I. Kanatsoulis, Sanmi Koyejo

Recent interest in building foundation models for knowledge graphs has highlighted a fundamental challenge: knowledge graph data is scarce. The best-known knowl- edge graphs are primarily human-labeled, created by pattern-matching, or extracted using early NLP techniques. While human-generated knowledge graphs are in short supply, automatically extracted ones are of questionable quality. We present KGGen, a novel text-to-knowledge-graph generator that uses language models to extract high-quality graphs from plain text with a novel entity resolution approach that clusters related entities, significantly reducing the sparsity problem that plagues existing extractors. Unlike other KG generators, KGGen clusters and de-duplicates related entities to reduce sparsity in extracted KGs. Along with KGGen, we release Measure of Information in Nodes and Edges (MINE), the first benchmark to test an extractor’s ability to produce a useful KG from plain text. We benchmark our new tool against leading existing generators such as Microsoft’s GraphRAG; we achieve comparable retrieval accuracy on the generated graphs and better information re- tention. Moreover, our graphs exhibit more concise and generalizable entities and relations. Our code is open-sourced at https://github.com/stair-lab/kg-gen/.

Subject: NeurIPS.2025 - Poster

YyhRJXxbpi@OpenReview

#1 KGGen: Extracting Knowledge Graphs from Plain Text with Language Models [PDF1] [Copy] [Kimi1] [REL]

#1 KGGen: Extracting Knowledge Graphs from Plain Text with Language Models [PDF¹] [Copy] [Kimi¹] [REL]