Hand-Object Interaction Pretraining from Videos

#1 Hand-Object Interaction Pretraining from Videos [PDF⁸] [Copy] [Kimi⁶] [REL]

Authors: Himanshu Gaurav Singh, Antonio Loquercio, Carmelo Sferrazza, Jane Wu, Haozhi Qi, Pieter Abbeel, Jitendra Malik

We present an approach to learn general robot manipulation priors from 3D hand-object interaction trajectories. We build a framework to use in-the-wild videos to generate sensorimotor robot trajectories. We do so by lifting both the human hand and the manipulated object in a shared 3D space and retargeting human motions to robot actions. Generative modeling on this data gives us a task-agnostic base policy. This policy captures a general yet flexible manipulation prior. We empirically demonstrate that finetuning this policy, with both reinforcement learning (RL) and behavior cloning (BC), enables sample-efficient adaptation to downstream tasks and simultaneously improves robustness and generalizability compared to prior approaches. Qualitative experiments are available at: \url{https://hgaurav2k.github.io/hop/}.

Subjects: Robotics , Artificial Intelligence , Computer Vision and Pattern Recognition

Publish: 2024-09-12 17:59:07 UTC

2409.08273

#1 Hand-Object Interaction Pretraining from Videos [PDF8] [Copy] [Kimi6] [REL]

#1 Hand-Object Interaction Pretraining from Videos [PDF⁸] [Copy] [Kimi⁶] [REL]