Total: 1
Generative modeling aims to transform chaotic noise into structured outputs that align with training data distributions. In this work, we enhance video diffusion generative models by introducing motion control as a structured component within latent space sampling. Specifically, we propose a novel real-time noise warping method that replaces random temporal Gaussianity with correlated warped noise derived from optical flow fields, enabling fine-grained motion control independent of model architecture and guidance type. We fine-tune modern video diffusion base models and provide a unified paradigm for a wide range of user-friendly motion control: local object motion control, global camera movement control, and motion transfer. By leveraging a real-time noise-warping algorithm that preserves spatial Gaussianity while efficiently maintaining temporal consistency, we enable flexible and diverse motion control applications with minimal trade-offs in pixel quality and temporal coherence. Extensive experiments and user studies demonstrate the advantages of our method in terms of visual quality, motion controllability, and temporal consistency, making it a robust and scalable solution for motion-controllable video synthesis.