Bounding Boxes as Goals: Language-Conditioned Grasping via Neuro-Symbolic Planning

#1 Bounding Boxes as Goals: Language-Conditioned Grasping via Neuro-Symbolic Planning [PDF⁴] [Copy] [Kimi²] [REL]

Authors: Allison Andreyev, Landon Eum, Nestor Tiglao, Romel Gomez

For robotics to be effectively integrated into household or industrial environments, machines must adapt to natural-language prompts in real time. Although Vision-Language Models (VLMs) have enabled zero-shot generalization in robot task and motion planning (TAMP), current state-of-the-art approaches often remain computationally "heavyweight" or require extensive training on thousands of demonstrations. We present GRASP (Grounded Reasoning and Symbolic Planning), a framework designed as a step toward open-vocabulary tabletop manipulation. Our approach leverages a pretrained VLM to translate natural-language queries into neuro-symbolic goal states, grounded in the physical world via a bounding-box detection pipeline. Unlike methods that rely on fixed color lists or hard-coded coordinates, GRASP enables robots to interpret abstract spatial concepts such as "top shelf" and execute tasks without additional fine-tuning. We achieve 73.3% overall success across 90 real-robot trials at three difficulty levels, requiring no task-specific training.

Subjects: Robotics , Artificial Intelligence , Computer Vision and Pattern Recognition , Systems and Control

Publish: 2026-06-11 05:09:34 UTC

2606.12910

#1 Bounding Boxes as Goals: Language-Conditioned Grasping via Neuro-Symbolic Planning [PDF4] [Copy] [Kimi2] [REL]

#1 Bounding Boxes as Goals: Language-Conditioned Grasping via Neuro-Symbolic Planning [PDF⁴] [Copy] [Kimi²] [REL]