Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers

#1 Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers [PDF] [Copy] [Kimi] [REL]

Authors: Divyansh Srivastava, Xiang Zhang, He Wen, Chenru Wen, Zhuowen Tu

We present Lay-Your-Scene (shorthand LayouSyn), a novel text-to-layout generation pipeline for natural scenes. Prior scene layout generation methods are either closed-vocabulary or use proprietary large language models for open-vocabulary generation, limiting their modeling capabilities and broader applicability in controllable image generation. In this work, we propose to use lightweight open-source language models to obtain scene elements from text prompts and a novel aspect-aware diffusion Transformer architecture trained in an open-vocabulary manner for conditional layout generation. Extensive experiments demonstrate that LayouSyn outperforms existing methods and achieves state-of-the-art performance on challenging spatial and numerical reasoning benchmarks. Additionally, we present two applications of LayouSyn: First, we show that coarse initialization from large language models can be seamlessly combined with our method to achieve better results. Second, we present a pipeline for adding objects to images, demonstrating the potential of LayouSyn in image editing applications.

Subject: ICCV.2025 - Poster

Srivastava_Lay-Your-Scene_Natural_Scene_Layout_Generation_with_Diffusion_Transformers@ICCV2025@CVF

#1 Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers [PDF] [Copy] [Kimi] [REL]