iZy3ZgoHrD@OpenReview

Total: 1

#1 ZeroPatcher: Training-free Sampler for Video Inpainting and Editing [PDF2] [Copy] [Kimi1] [REL]

Authors: Shaoshu Yang, Yingya Zhang, Ran He

Video inpainting and editing have long been challenging tasks in the video generation community, requiring extensive computational resources and large datasets to train models with satisfactory performance. Recent breakthroughs in large-scale video foundation models have greatly enhanced text-to-video generation capabilities. This naturally leads to the idea of leveraging the prior knowledge from these powerful generators to facilitate video inpainting and editing. In this work, we investigate the feasibility of employing pre-trained text-to-video foundation models for high-quality video inpainting and editing without additional training. Specifically, we introduce a model-agnostic denoising sampler that optimizes the trajectory by maximizing the log-likelihood expectation conditioned on the known video segments. To enable efficient dynamic object removal and replacement, we propose a latent mask fuser that performs accurate video masking directly in latent space, eliminating the need for explicit VAE decoding and encoding. We implement our approach in widely-used foundation generators such as CogVideoX and HunyuanVideo, demonstrating the model-agnostic nature of our sampler. Comprehensive quantitative and qualitative evaluations confirm that our method achieves outstanding video inpainting and editing performance in a plug-and-play fashion.

Subject: NeurIPS.2025 - Poster