P18-2022@ACL

Total: 1

#1 SNAG: Spoken Narratives and Gaze Dataset [PDF] [Copy] [Kimi1]

Authors: Preethi Vaidyanathan ; Emily T. Prud’hommeaux ; Jeff B. Pelz ; Cecilia O. Alm

Humans rely on multiple sensory modalities when examining and reasoning over images. In this paper, we describe a new multimodal dataset that consists of gaze measurements and spoken descriptions collected in parallel during an image inspection task. The task was performed by multiple participants on 100 general-domain images showing everyday objects and activities. We demonstrate the usefulness of the dataset by applying an existing visual-linguistic data fusion framework in order to label important image regions with appropriate linguistic labels.