Dataflow-Guided Neuro-Symbolic Language Models for Type Inference

#1 Dataflow-Guided Neuro-Symbolic Language Models for Type Inference [PDF] [Copy] [Kimi] [REL]

Authors: gen li, Yao Wan, Hongyu Zhang, Zhou Zhao, Wenbin Jiang, Xuanhua Shi, Hai Jin, Zheng Wang

Language Models (LMs) are increasingly used for type inference, aiding in error detection and software development. Some real-world deployments of LMs require the model to run on local machines to safeguard the intellectual property of the source code. This setting often limits the size of the LMs that can be used. We present Nester, the first neuro-symbolic approach that enhances LMs for type inference by integrating symbolic learning without increasing model size. Nester breaks type inference into sub-tasks based on the data and control flow of the input code, encoding them as a modular high-level program. This program executes multi-step actions, such as evaluating expressions and analyzing conditional branches of the target code, combining static typing with LMs to infer potential types. Evaluated on the ManyTypes4Py dataset in Python, Nester outperforms two state-of-the-art type inference methods (HiTyper and TypeGen), achieving 70.7\% Top-1 Exact Match, which is 18.3\% and 3.6\% higher than HiTyper and TypeGen, respectively. For complex type annotations like typing.Optional and typing.Union, Nester achieves 51.0\% and 16.7\%, surpassing TypeGen by 28.3\% and 5.8\%.

Subject: ICML.2025 - Poster

o5D8i2zZ1l@OpenReview

#1 Dataflow-Guided Neuro-Symbolic Language Models for Type Inference [PDF] [Copy] [Kimi] [REL]