BannerBench: Benchmarking Vision Language Models for Multi-Ad Selection with Human Preferences

#1 BannerBench: Benchmarking Vision Language Models for Multi-Ad Selection with Human Preferences [PDF] [Copy] [Kimi] [REL]

Authors: Hiroto Otake, Peinan Zhang, Yusuke Sakai, Masato Mita, Hiroki Ouchi, Taro Watanabe

Web banner advertisements, which are placed on websites to guide users to a targeted landing page (LP), are still often selected manually because human preferences are important in selecting which ads to deliver. To automate this process, we propose a new benchmark, BannerBench, to evaluate the human preference-driven banner selection process using vision-language models (VLMs). This benchmark assesses the degree of alignment with human preferences in two tasks: a ranking task and a best-choice task, both using sets of five images derived from a single LP. Our experiments show that VLMs are moderately correlated with human preferences on the ranking task. In the best-choice task, most VLMs perform close to chance level across various prompting strategies. These findings suggest that although VLMs have a basic understanding of human preferences, most of them struggle to pinpoint a single suitable option from many candidates.

Subject: EMNLP.2025 - Findings

2025.findings-emnlp.1311@ACL

#1 BannerBench: Benchmarking Vision Language Models for Multi-Ad Selection with Human Preferences [PDF] [Copy] [Kimi] [REL]