Total: 1
Generating harmonic and diverse human motions from music signals, especially for a group of dancers, is a practical yet challenging task in virtual avatar creation. Existing methods merely model a fixed number of dancers, lacking the flexibility for arbitrary individuals. To fulfill this goal, we propose a novel unified framework, FreeDance. Considering the plausibility of generating arbitrary dancers while preserving the diverse dynamics of multiple individuals, we build the framework upon collaborative masked token modeling in a 2D discrete space. In particular, we devise a Cross-modality Residual Alignment Module (CRAM) to diversify the movement of each individual and intensify its alignment with music. CRAM captures the spatial motion deformation of each dancer using residual learning and integrates it with rhythmic representation to reinforce the intrinsic connection. Moreover, recognizing the requirement of interactive coordination, we design a Temporal Interaction Module (TIM). Benefiting from masked 2D motion tokens, TIM effectively models the temporal correlation between current individuals w.r.t. neighboring dancers as interaction guidance to foster stronger inter-dancer dependencies. Extensive experiments demonstrate that our approach generates harmonic group dance with any number of individuals, outperforming state-of-the-art methods adapting number-fixed counterparts. Code is available at https://github.com/Tsukasane/FreeDance.