Total: 1
We introduce **Sketchtopia, a large-scale dataset and AI framework designed to explore goal-driven, multimodal communication through asynchronous interactions** in a Pictionary-inspired setup. Sketchtopia captures natural human interactions, including freehand sketches, open-ended guesses, and iconic feedback gestures, showcasing the complex dynamics of cooperative communication under constraints. It features over **20K gameplay sessions from 916 players, capturing 263K sketches, 10K erases, 56K guesses and 19.4K iconic feedbacks**. We introduce **multimodal foundational agents** with capabilities for generative sketching, guess generation and asynchronous communication. Our dataset also includes **800 human-agent sessions** for benchmarking the agents. We introduce **novel metrics** to characterize collaborative success, responsiveness to feedback and inter-agent asynchronous communication. Sketchtopia pushes the boundaries of multimodal AI, establishing **a new benchmark for studying asynchronous, goal-oriented interactions between humans and AI agents**.