SGAR: Structural Generative Augmentation for 3D Human Motion Retrieval

#1 SGAR: Structural Generative Augmentation for 3D Human Motion Retrieval [PDF] [Copy] [Kimi] [REL]

Authors: Jiahang Zhang, Lilang Lin, Shuai Yang, Jiaying Liu

3D human motion-text retrieval is essential for accurate motion understanding, targeted at cross-modal alignment learning. Existing methods typically align the global motion-text concepts directly, suffering from sub-optimal generalization due to the uncertainty of correspondence learning between multiple motion concepts coupled in a single motion/text sequence. Therefore, we study the explicit fine-grained concept decomposition for alignment learning and present a novel framework, Structural Generative Augmentation for 3D Human Motion Retrieval (SGAR), to enable generation-augmented retrieval. Specifically, relying on the strong priors of existing large language model (LLM) assets, we effectively decompose human motions structurally into subtler semantic units, \ie, body parts, for fine-grained motion modeling. Based on this, we develop part-mixture learning to better decouple the local motion concept learning, boosting part-level alignment. Moreover, a directional relation alignment strategy exploiting the correspondence between full-body and part motions is incorporated to regularize feature manifold for better consistency. Extensive experiments on three benchmarks, including motion-text retrieval as well as recognition and generation applications, demonstrate the superior performance and promising transferability of our method.

Subject: NeurIPS.2025 - Poster

9GHaLDORNL@OpenReview

#1 SGAR: Structural Generative Augmentation for 3D Human Motion Retrieval [PDF] [Copy] [Kimi] [REL]