2024.naacl-demo.9@ACL

Total: 1

#1 ATLAS: A System for PDF-centric Human Interaction Data Collection [PDF] [Copy] [Kimi] [REL]

Authors: Alexa Siu ; Zichao Wang ; Joshua Hoeflich ; Naman Kapasi ; Ani Nenkova ; Tong Sun

The Portable Document Format (PDF) is a popular format for distributing digital documents. Datasets on PDF reading behaviors and interactions remain limited due to the challenges of instrumenting PDF readers for these data collection tasks. We present ATLAS, a data collection tool designed to better support researchers in collecting rich PDF-centric datasets from users. ATLAS supports researchers in programmatically creating a user interface for data collection that is ready to share with annotators. It includes a toolkit and an extensible schema to easily customize the data collection tasks for a variety of purposes, allowing collection of PDF annotations (e.g., highlights, drawings) as well as reading behavior analytics (e.g., page scroll, text selections). We open-source ATLAS1 to support future research efforts and review use cases of ATLAS that showcase our system’s broad applicability.