P9DQ2IExgS@OpenReview

Total: 1

#1 Synthesizing Software Engineering Data in a Test-Driven Manner [PDF] [Copy] [Kimi] [REL]

Authors: Lei Zhang, Jiaxi Yang, Min Yang, Jian Yang, Mouxiang Chen, Jiajun Zhang, Zeyu Cui, Binyuan Hui, Junyang Lin

We introduce **SWE-Flow**, a novel data synthesis framework grounded in Test-Driven Development (TDD).Unlike existing software engineering data that rely on human-submitted issues, **SWE-Flow** automatically infers incremental development steps directly from unit tests, which inherently encapsulate high-level requirements.The core of **SWE-Flow** is the construction of a Runtime Dependency Graph (RDG), which precisely captures function interactions, enabling the generation of a structured, step-by-step *development schedule*.At each step, **SWE-Flow** produces a partial codebase, the corresponding unit tests, and the necessary code modifications, resulting in fully verifiable TDD tasks.With this approach, we generated 16,061 training instances and 2,020 test instances from real-world GitHub projects, creating the **SWE-Flow-Eval** benchmark.Our experiments show that fine-tuning open model on this dataset significantly improves performance in TDD-based coding.To facilitate further research, we release all code, datasets, models, and Docker images at [Github](https://github.com/Hambaobao/SWE-Flow).

Subject: ICML.2025 - Poster