Total: 1
Diffusion models have transformed image editing but struggle with precise depth-aware control, such as placing objects at a specified depth. Layered representations offer fine-grained control by decomposing an image into separate editable layers. However, existing methods simplistically represent a scene via a set of background and transparent foreground layers while ignoring the scene geometry - limiting their effectiveness for depth-aware editing. We propose Depth-Guided Layer Decomposition - a layering method that decomposes an image into foreground and background layers based on a user-specified depth value, enabling precise depth-aware edits. We further propose Feature Guided Layer Compositing - a zero-shot approach for realistic layer compositing by leveraging generative priors from pretrained diffusion models. Specifically, we guide the internal U-Net features to progressively fuse individual layers into a composite latent at each denoising step. This preserves the structure of individual layers while generating realistic outputs with appropriate color and lighting adjustments without a need for post-hoc harmonization models. We demonstrate our method on two key depth-aware editing tasks: 1) scene compositing by blending the foreground of one scene with the background of another at a specified depth, and; 2) object insertion at a user-defined depth. Our zero-shot approach achieves precise depth ordering and high-quality edits, surpassing specialized scene compositing and object placement baselines, as validated across benchmarks and user studies.