Revisiting put-that-there, context aware window interactions via LLMs

#1 Revisiting put-that-there, context aware window interactions via LLMs [PDF] [Copy] [Kimi] [REL]

Authors: Riccardo Bovo, Daniele Giunchi, Pasquale Cascarano, Eric J. Gonzalez, Mar Gonzalez-Franco

We revisit Bolt's classic "Put-That-There" concept for modern head-mounted displays by pairing Large Language Models (LLMs) with XR sensor and tech stack. The agent fuses (i) a semantically segmented 3-D environment, (ii) live application metadata, and (iii) users' verbal, pointing, and head-gaze cues to issue JSON window-placement actions. As a result, users can manage a panoramic workspace through: (1) explicit commands ("Place Google Maps on the coffee table"), (2) deictic speech plus gestures ("Put that there"), or (3) high-level goals ("I need to send a message"). Unlike traditional explicit interfaces, our system supports one-to-many action mappings and goal-centric reasoning, allowing the LLM to dynamically infer relevant applications and layout decisions, including interrelationships across tools. This enables seamless, intent-driven interaction without manual window juggling in immersive XR environments.

Subject: Human-Computer Interaction

Publish: 2025-11-04 08:58:30 UTC

2511.02378

#1 Revisiting put-that-there, context aware window interactions via LLMs [PDF] [Copy] [Kimi] [REL]