Chart2Code53: A Large-Scale Diverse and Complex Dataset for Enhancing Chart-to-Code Generation

#1 Chart2Code53: A Large-Scale Diverse and Complex Dataset for Enhancing Chart-to-Code Generation [PDF] [Copy] [Kimi] [REL]

Authors: Tianhao Niu, Yiming Cui, Baoxin Wang, Xiao Xu, Xin Yao, Qingfu Zhu, Dayong Wu, Shijin Wang, Wanxiang Che

Chart2code has recently received significant attention in the multimodal community due to its potential to reduce the burden of visualization and promote a more detailed understanding of charts. However, existing Chart2code-related training datasets suffer from at least one of the following issues: (1) limited scale, (2) limited type coverage, and (3) inadequate complexity. To address these challenges, we seek more diverse sources that better align with real-world user distributions and propose dual data synthesis pipelines: (1) synthesize based on online plotting code. (2) synthesize based on chart images in the academic paper. We create a large-scale Chart2code training dataset Chart2code53, including 53 chart types, 130K Chart-code pairs based on the pipeline. Experimental results demonstrate that even with few parameters, the model finetuned on Chart2code53 achieves state-of-the-art performance on multiple Chart2code benchmarks within open-source models.

Subject: EMNLP.2025 - Main

2025.emnlp-main.799@ACL

#1 Chart2Code53: A Large-Scale Diverse and Complex Dataset for Enhancing Chart-to-Code Generation [PDF] [Copy] [Kimi] [REL]