2509.20149

Total: 1

#1 Synergistic Enhancement of Requirement-to-Code Traceability: A Framework Combining Large Language Model based Data Augmentation and an Advanced Encoder [PDF] [Copy] [Kimi] [REL]

Authors: Jianzhang Zhang, Jialong Zhou, Nan Niu, Jinping Hua, Chuang Liu

Automated requirement-to-code traceability link recovery, essential for industrial system quality and safety, is critically hindered by the scarcity of labeled data. To address this bottleneck, this paper proposes and validates a synergistic framework that integrates large language model (LLM)-driven data augmentation with an advanced encoder. We first demonstrate that data augmentation, optimized through a systematic evaluation of bi-directional and zero/few-shot prompting strategies, is highly effective, while the choice among leading LLMs is not a significant performance factor. Building on the augmented data, we further enhance an established, state-of-the-art pre-trained language model based method by incorporating an encoder distinguished by a broader pre-training corpus and an extended context window. Our experiments on four public datasets quantify the distinct contributions of our framework's components: on its own, data augmentation consistently improves the baseline method, providing substantial performance gains of up to 26.66%; incorporating the advanced encoder provides an additional lift of 2.21% to 11.25%. This synergy culminates in a fully optimized framework with maximum gains of up to 28.59% on $F_1$ score and 28.9% on $F_2$ score over the established baseline, decisively outperforming ten established baselines from three dominant paradigms. This work contributes a pragmatic and scalable methodology to overcome the data scarcity bottleneck, paving the way for broader industrial adoption of data-driven requirement-to-code traceability.

Subject: Software Engineering

Publish: 2025-09-24 14:14:21 UTC