Total: 1
Three-dimensional scene generation is crucial in computer vision, with applications spanning autonomous driving and gaming. However, current methods offer limited or non-intuitive user control. In this work, we propose a method that uses scene graph as a user-friendly control format to generate outdoor 3D scenes. We develop an interactive system that transforms a sparse scene graph into a dense Bird's Eye View (BEV) Embedding Map, which guides a conditional diffusion model to generate 3D scenes that match the scene graph description. Users can easily create or modify scene graphs to generate large-scale outdoor scenes. We create a large-scale dataset with paired scene graphs and 3D semantic scenes to train the BEV embedding and diffusion models. Experimental results show that our approach consistently produces high-quality 3D urban scenes closely aligned with the input scene graphs. To the best of our knowledge, this is the first approach to generate 3D outdoor scenes conditioned on scene graphs. Code is available at https://github.com/yuhengliu02/control-3d-scene.