2508.06543

Total: 1

#1 MILD: Multi-Layer Diffusion Strategy for Complex and Precise Multi-IP Aware Human Erasing [PDF1] [Copy] [Kimi] [REL]

Authors: Jinghan Yu, Zhiyuan Ma, Yue Ma, Kaiqi Liu, Yuhan Wang, Jianjun Li

Recent years have witnessed the success of diffusion models in image-customized tasks. Prior works have achieved notable progress on human-oriented erasing using explicit mask guidance and semantic-aware inpainting. However, they struggle under complex multi-IP scenarios involving human-human occlusions, human-object entanglements, and background interferences. These challenges are mainly due to: 1) Dataset limitations, as existing datasets rarely cover dense occlusions, camouflaged backgrounds, and diverse interactions; 2) Lack of spatial decoupling, where foreground instances cannot be effectively disentangled, limiting clean background restoration. In this work, we introduce a high-quality multi-IP human erasing dataset with diverse pose variations and complex backgrounds. We then propose Multi-Layer Diffusion (MILD), a novel strategy that decomposes generation into semantically separated pathways for each instance and the background. To enhance human-centric understanding, we introduce Human Morphology Guidance, integrating pose, parsing, and spatial relations. We further present Spatially-Modulated Attention to better guide attention flow. Extensive experiments show that MILD outperforms state-of-the-art methods on challenging human erasing benchmarks.

Subject: Computer Vision and Pattern Recognition

Publish: 2025-08-05 13:56:24 UTC