MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization

#1 MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization [PDF] [Copy] [Kimi] [REL]

Authors: Yinhong Liu, Jianfeng He, Hang Su, Ruixue Lian, Yi Nian, Jake W. Vincent, Srikanth Vishnubhotla, Robinson Piramuthu, Saab Mansour

Multimodal Dialogue Summarization (MDS) is a critical task with wide-ranging applications. To support the development of effective MDS models, robust automatic evaluation methods are essential for reducing both cost and human effort. However, such methods require a strong meta-evaluation benchmark grounded in human annotations. In this work, we introduce MDSEval, the first meta-evaluation benchmark for MDS, consisting image-sharing dialogues, corresponding summaries, and human judgments across eight well-defined quality aspects. To ensure data quality and richfulness, we propose a novel filtering framework leveraging Mutually Exclusive Key Information (MEKI) across modalities. Our work is the first to identify and formalize key evaluation dimensions specific to MDS. Finally, we benchmark state-of-the-art modal evaluation methods, revealing their limitations in distinguishing summaries from advanced MLLMs and their susceptibility to various bias.

Subject: EMNLP.2025 - Findings

2025.findings-emnlp.794@ACL

#1 MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization [PDF] [Copy] [Kimi] [REL]