F2TEval: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text Task

#1 F2TEval: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text Task [PDF] [Copy] [Kimi¹] [REL]

Authors: Tan Yue, Rui Mao, Zilong Song, Zonghai Hu, Dongyan Zhao

Figure-to-Text (F2T) tasks aim to convert structured figure information into natural language text, serving as a bridge between visual perception and language understanding.However, existing evaluation methods remain limited: 1) Reference-based methods can only capture shallow semantic similarities and rely on costly labeled reference text; 2) Reference-free methods depend on multimodal large language models, which suffer from low efficiency and instruction sensitivity; 3) Existing methods provide only sample-level evaluations, lacking interpretability and alignment with expert-level multi-dimensional evaluation criteria.Accordingly, we propose F2TEval, a five-dimensional reference-free evaluation method aligned with expert criteria, covering faithfulness, completeness, conciseness, logicality, and analysis, to support fine-grained evaluation. We design a lightweight mixture-of-experts model that incorporates independent scoring heads and applies the Hilbert-Schmidt Independence Criterion to optimize the disentanglement of scoring representations across dimensions. Furthermore, we construct F2TBenchmark, a human-annotated benchmark dataset covering 21 chart types and 35 application domains, to support research on F2T evaluation. Experimental results demonstrate our model’s superior performance and efficiency, outperforming Gemini-2.0 and Claude-3.5 with only 0.9B parameters.

Subject: EMNLP.2025 - Main

2025.emnlp-main.195@ACL

#1 F2TEval: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text Task [PDF] [Copy] [Kimi1] [REL]

#1 F2TEval: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text Task [PDF] [Copy] [Kimi¹] [REL]