PoSum-Bench: Benchmarking Position Bias in LLM-based Conversational Summarization

#1 PoSum-Bench: Benchmarking Position Bias in LLM-based Conversational Summarization [PDF] [Copy] [Kimi] [REL]

Authors: Xu Sun, Lionel Delphin-Poulat, Christèle Tarnec, Anastasia Shimorina

Large language models (LLMs) are increasingly used for zero-shot conversation summarization, but often exhibit positional bias—tending to overemphasize content from the beginning or end of a conversation while neglecting the middle. To address this issue, we introduce PoSum-Bench, a comprehensive benchmark for evaluating positional bias in conversational summarization, featuring diverse English and French conversational datasets spanning formal meetings, casual conversations, and customer service interactions. We propose a novel semantic similarity-based sentence-level metric to quantify the direction and magnitude of positional bias in model-generated summaries, enabling systematic and reference-free evaluation across conversation positions, languages, and conversational contexts.Our benchmark and methodology thus provide the first systematic, cross-lingual framework for reference-free evaluation of positional bias in conversational summarization, laying the groundwork for developing more balanced and unbiased summarization models.

Subject: EMNLP.2025 - Main

2025.emnlp-main.404@ACL

#1 PoSum-Bench: Benchmarking Position Bias in LLM-based Conversational Summarization [PDF] [Copy] [Kimi] [REL]