Beyond Sequences: Two-dimensional Representation and Dependency Encoding for Code Generation

#1 Beyond Sequences: Two-dimensional Representation and Dependency Encoding for Code Generation [PDF⁴] [Copy] [Kimi³] [REL]

Authors: Xiangyu Zhang, Yu Zhou, Guang Yang, Wei Cheng, Taolue Chen

The advent of large language models has significantly advanced automatic code generation, transforming the way programmers writing code. Inspired by natural language processing, mainstream code generation approaches represent code as a linear sequence of tokens. In this paper, we propose to represent code snippets as two-dimensional entities, where both code lines and tokens within lines are explicitly modeled. This representation allows us to capture the hierarchical and spatial structure of code, especially the dependencies between code lines. Our method CoDE introduces a dependency encoding approach that leverages dictionary learning to perform semantic matching between code lines. As such, it avoids the reliance on strict position indices, leading to better generalization to code with diverse context and lengths. We thoroughly evaluate CoDE based on four categories of tasks. The experimental results showcase its generalizability, context understanding and retrieval, as well as interpretability in code generation.

Subject: ACL.2025 - Long Papers

2025.acl-long.308@ACL

#1 Beyond Sequences: Two-dimensional Representation and Dependency Encoding for Code Generation [PDF4] [Copy] [Kimi3] [REL]

#1 Beyond Sequences: Two-dimensional Representation and Dependency Encoding for Code Generation [PDF⁴] [Copy] [Kimi³] [REL]