Viewpoint-Agnostic Grasp Pipeline using VLM and Partial Observations

#1 Viewpoint-Agnostic Grasp Pipeline using VLM and Partial Observations [PDF²] [Copy] [Kimi] [REL]

Authors: Dilermando Almeida, Juliano Negri, Guilherme Lazzarini, Thiago H. Segreto, Ranulfo Bezerra, Ricardo V. Godoy, Marcelo Becker

Robust grasping in cluttered, unstructured environments remains challenging for mobile legged manipulators due to occlusions that lead to partial observations, unreliable depth estimates, and the need for collision-free, execution-feasible approaches. In this paper we present an end-to-end pipeline for language-guided grasping that bridges open-vocabulary target selection to safe grasp execution on a real robot. Given a natural-language command, the system grounds the target in RGB using open-vocabulary detection and promptable instance segmentation, extracts an object-centric point cloud from RGB-D, and improves geometric reliability under occlusion via back-projected depth compensation and two-stage point cloud completion. We then generate and collision-filter 6-DoF grasp candidates and select an executable grasp using safety-oriented heuristics that account for reachability, approach feasibility, and clearance. We evaluate the method on a quadruped robot with an arm in two cluttered tabletop scenarios, using paired trials against a view-dependent baseline. The proposed approach achieves a 90% overall success rate (9/10) against 30% (3/10) for the baseline, demonstrating substantially improved robustness to occlusions and partial observations in clutter.

Subjects: Robotics , Machine Learning , Systems and Control

Publish: 2026-03-09 00:42:32 UTC

2603.07866

#1 Viewpoint-Agnostic Grasp Pipeline using VLM and Partial Observations [PDF2] [Copy] [Kimi] [REL]

#1 Viewpoint-Agnostic Grasp Pipeline using VLM and Partial Observations [PDF²] [Copy] [Kimi] [REL]