Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics

#1 Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics [PDF] [Copy] [Kimi⁵] [REL]

Authors: Yuriel Ryan, Rui Yang Tan, Kenny Tsu Wei Choo, Roy Ka-Wei Lee

Understanding humor is a core aspect of social intelligence, yet it remains a significant challenge for Large Multimodal Models (LMMs). We introduce PixelHumor, a benchmark dataset of 2,800 annotated multi-panel comics designed to evaluate LMMs' ability to interpret multimodal humor and recognize narrative sequences. Experiments with state-of-the-art LMMs reveal substantial gaps: for instance, top models achieve only 61% accuracy in panel sequencing, far below human performance. This underscores critical limitations in current models' integration of visual and textual cues for coherent narrative and humor understanding. By providing a rigorous framework for evaluating multimodal contextual and narrative reasoning, PixelHumor aims to drive the development of LMMs that better engage in natural, socially aware interactions.

Subjects: Computer Vision and Pattern Recognition , Artificial Intelligence , Computation and Language

Publish: 2025-09-12 01:39:24 UTC

2509.12248

#1 Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics [PDF] [Copy] [Kimi5] [REL]

#1 Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics [PDF] [Copy] [Kimi⁵] [REL]