2025.acl-long.515@ACL

Total: 1

#1 MCS-Bench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in Chinese Classical Studies [PDF4] [Copy] [Kimi2] [REL]

Authors: Yang Liu, Jiahuan Cao, Hiuyi Cheng, Yongxin Shi, Kai Ding, Lianwen Jin

With the rapid development of Multimodal Large Language Models (MLLMs), their potential in Chinese Classical Studies (CCS), a field which plays a vital role in preserving and promoting China’s rich cultural heritage, remains largely unexplored due to the absence of specialized benchmarks. To bridge this gap, we propose MCS-Bench, the first-of-its-kind multimodal benchmark specifically designed for CCS across multiple subdomains. MCS-Bench spans seven core subdomains (Ancient Chinese Text, Calligraphy, Painting, Oracle Bone Script, Seal, Cultural Relic, and Illustration), with a total of 45 meticulously designed tasks. Through extensive evaluation of 37 representative MLLMs, we observe that even the top-performing model (InternVL2.5-78B) achieves an average score below 50, indicating substantial room for improvement. Our analysis reveals significant performance variations across different tasks and identifies critical challenges in areas such as Optical Character Recognition (OCR) and cultural context interpretation. MCS-Bench not only establishes a standardized baseline for CCS-focused MLLM research but also provides valuable insights for advancing cultural heritage preservation and innovation in the Artificial General Intelligence (AGI) era. Data and code will be publicly available.

Subject: ACL.2025 - Long Papers