Khan_Sketchtopia_A_Dataset_and_Foundational_Agents_for_Benchmarking_Asynchronous_Multimodal@CVPR2025@CVF

Total: 1

#1 Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback [PDF1] [Copy] [Kimi1] [REL]

Authors: Mohd Hozaifa Khan, Ravi Kiran Sarvadevabhatla

We introduce **Sketchtopia, a large-scale dataset and AI framework designed to explore goal-driven, multimodal communication through asynchronous interactions** in a Pictionary-inspired setup. Sketchtopia captures natural human interactions, including freehand sketches, open-ended guesses, and iconic feedback gestures, showcasing the complex dynamics of cooperative communication under constraints. It features over **20K gameplay sessions from 916 players, capturing 263K sketches, 10K erases, 56K guesses and 19.4K iconic feedbacks**. We introduce **multimodal foundational agents** with capabilities for generative sketching, guess generation and asynchronous communication. Our dataset also includes **800 human-agent sessions** for benchmarking the agents. We introduce **novel metrics** to characterize collaborative success, responsiveness to feedback and inter-agent asynchronous communication. Sketchtopia pushes the boundaries of multimodal AI, establishing **a new benchmark for studying asynchronous, goal-oriented interactions between humans and AI agents**.

Subject: CVPR.2025 - Poster