Total: 1
Current virtual try-on methods primarily enhance performance through network optimization, employing strategies such as coarse-to-fine structures and ReferenceNet to inject clothing information. However, these methods are often constrained by the limited quantity and diversity of training samples, which ultimately restricts their potential for further improvement. To address this challenge, we propose a unified, mask-free virtual try-on framework. Our approach leverages the inherent strengths of latent diffusion models to enhance the capability of each pipeline component to accurately model the target distribution, thereby achieving superior performance. Specifically, we introduce a text-driven pseudo-input preparation strategy that significantly increases the diversity of clothing regions within the generated person pseudo-samples. This encourages the generator to focus on variations in these areas and improves the model's generalization capability. Within the generator itself, we develop a gated manipulation mechanism to prevent weight forgetting and reduce training costs. Furthermore, we incorporate a texture-aware injection module to explicitly integrate human-perceptible clothing texture information into the generation process. During inference, we propose a refining conditional inference strategy that mitigates the randomness introduced by Gaussian noise. This effectively preserves identity information and fine clothing details in the final results. Extensive experiments demonstrate that our method outperforms existing state-of-the-art virtual try-on approaches.