2601.13007

Total: 1

#1 ArchAgent: Scalable Legacy Software Architecture Recovery with LLMs [PDF] [Copy] [Kimi] [REL]

Authors: Rusheng Pan, Bingcheng Mao, Tianyi Ma, Zhenhua Ling

Recovering accurate architecture from large-scale legacy software is hindered by architectural drift, missing relations, and the limited context of Large Language Models (LLMs). We present ArchAgent, a scalable agent-based framework that combines static analysis, adaptive code segmentation, and LLM-powered synthesis to reconstruct multiview, business-aligned architectures from cross-repository codebases. ArchAgent introduces scalable diagram generation with contextual pruning and integrates cross-repository data to identify business-critical modules. Evaluations of typical large-scale GitHub projects show significant improvements over existing benchmarks. An ablation study confirms that dependency context improves the accuracy of generated architectures of production-level repositories, and a real-world case study demonstrates effective recovery of critical business logics from legacy projects. The dataset is available at https://github.com/panrusheng/arch-eval-benchmark.

Subjects: Software Engineering , Artificial Intelligence

Publish: 2026-01-19 12:39:05 UTC