Wu_Less-to-More_Generalization_Unlocking_More_Controllability_by_In-Context_Generation@ICCV2025@CVF

Total: 1

#1 Less-to-More Generalization: Unlocking More Controllability by In-Context Generation [PDF] [Copy] [Kimi] [REL]

Authors: Shaojin Wu, Mengqi Huang, Wenxu Wu, Yufeng Cheng, Fei Ding, Qian He

Although subject-driven generation has been extensively explored in image generation due to its wide applications, it still has challenges in data scalability and subject expansibility. For the first challenge, moving from curating single-subject datasets to multiple-subject ones and scaling them is particularly difficult. For the second, most recent methods center on single-subject generation, making it hard to apply when dealing with multi-subject scenarios. In this study, we propose a highly-consistent data synthesis pipeline to tackle these challenges. This pipeline harnesses the intrinsic in-context generation capabilities of diffusion transformers and generates high-consistent multi-subject paired data. Additionally, we introduce UNO, a multi-subject driven customization architecture based on a diffusion transformer. UNO incorporates a progressive cross-modal alignment training paradigm that progresses from simpler single-subject conditioning to more complex multi-subject conditioning. Along with this, a universal rotary position embedding (UnoPE) adjusts the position indices. Extensive experiments show that our method can achieve high consistency while ensuring controllability in both single-subject and multi-subject driven generation. Code and model: https://github.com/bytedance/UNO.

Subject: ICCV.2025 - Poster