CodeRAG-Bench: Can Retrieval Augment Code Generation?

#1 CodeRAG-Bench: Can Retrieval Augment Code Generation? [PDF¹] [Copy] [Kimi⁴] [REL]

Authors: Zora Zhiruo Wang, Akari Asai, Xinyan Velocity Yu, Frank F. Xu, Yiqing Xie, Graham Neubig, Daniel Fried

While language models (LMs) excel at generating code, many programs are difficult to generate using only parametric knowledge. Despite the success of retrieval-augmented generation (RAG) in text-centric tasks, its potential for code generation remains under-explored. This work introduces CodeRAG-bench, a holistic retrieval-augmented code generation benchmark covering tasks like basic programming, open-domain, and repository-level problems and provides reproducible evaluations on both retrieval and end-to-end code generation performance. We further create a diverse, open datastore for code retrieval, aggregating sources such as competition solutions, tutorials, library documentation, StackOverflow posts, and GitHub repositories. Based on CodeRAG-bench, we conduct large-scale evaluations of 10 retrievers and 10 LMs and systematically analyze when retrieval can benefit code generation models and identify remaining challenges. We find that while retrieving high-quality contexts improves code generation, retrievers often struggle to fetch useful contexts, and generators face limitations in using those contexts effectively. We hope CodeRAG-bench encourages further development in code-oriented RAG methods.

Subject: NAACL.2025 - Findings

2025.findings-naacl.176@ACL

#1 CodeRAG-Bench: Can Retrieval Augment Code Generation? [PDF1] [Copy] [Kimi4] [REL]

#1 CodeRAG-Bench: Can Retrieval Augment Code Generation? [PDF¹] [Copy] [Kimi⁴] [REL]