Wu_MG-MotionLLM_A_Unified_Framework_for_Motion_Comprehension_and_Generation_across@CVPR2025@CVF

Total: 1

#1 MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities [PDF] [Copy] [Kimi] [REL]

Authors: Bizhu Wu, Jinheng Xie, Keming Shen, Zhe Kong, Jianfeng Ren, Ruibin Bai, Rong Qu, Linlin Shen

Recent motion-aware large language models have demonstrated promising potential in unifying motion comprehension and generation. However, existing studies often focus on coarse-grained motion-text modeling, limiting their ability to handle fine-grained motion-relevant tasks. To overcome this limitation, we pioneer MG-MotionLLM, a unified motion-language model for multi-granular motion comprehension and generation. We further introduce a comprehensive multi-granularity training scheme by incorporating a set of novel auxiliary tasks, such as localizing temporal boundaries of motion segments via detailed text and motion detailed captioning, to facilitate mutual reinforcement for motion-text modeling across various levels of granularity. Extensive experiments show that our MG-MotionLLM achieves superior performance on classical text-to-motion and motion-to-text tasks, and exhibits potential in novel fine-grained motion comprehension and editing tasks. Dataset and code will be released upon paper acceptance.

Subject: CVPR.2025 - Poster