VideoMaMa: Mask-Guided Video Matting via Generative Prior

#1 VideoMaMa: Mask-Guided Video Matting via Generative Prior [PDF⁷] [Copy] [Kimi¹⁴] [REL]

Authors: Sangbeom Lim, Seoung Wug Oh, Jiahui Huang, Heeji Yoon, Seungryong Kim, Joon-Young Lee

Generalizing video matting models to real-world videos remains a significant challenge due to the scarcity of labeled data. To address this, we present Video Mask-to-Matte Model (VideoMaMa) that converts coarse segmentation masks into pixel accurate alpha mattes, by leveraging pretrained video diffusion models. VideoMaMa demonstrates strong zero-shot generalization to real-world footage, even though it is trained solely on synthetic data. Building on this capability, we develop a scalable pseudo-labeling pipeline for large-scale video matting and construct the Matting Anything in Video (MA-V) dataset, which offers high-quality matting annotations for more than 50K real-world videos spanning diverse scenes and motions. To validate the effectiveness of this dataset, we fine-tune the SAM2 model on MA-V to obtain SAM2-Matte, which outperforms the same model trained on existing matting datasets in terms of robustness on in-the-wild videos. These findings emphasize the importance of large-scale pseudo-labeled video matting and showcase how generative priors and accessible segmentation cues can drive scalable progress in video matting research.

Subjects: Computer Vision and Pattern Recognition , Artificial Intelligence

Publish: 2026-01-20 18:59:56 UTC

2601.14255

#1 VideoMaMa: Mask-Guided Video Matting via Generative Prior [PDF7] [Copy] [Kimi14] [REL]

#1 VideoMaMa: Mask-Guided Video Matting via Generative Prior [PDF⁷] [Copy] [Kimi¹⁴] [REL]