Decomposition of Graphic Design with Unified Multimodal Model

#1 Decomposition of Graphic Design with Unified Multimodal Model [PDF²] [Copy] [Kimi¹] [REL]

Authors: Hui Nie, Zhao Zhang, Yutao Cheng, Maoke Yang, Gonglei Shi, Qingsong Xie, Jie Shao, Xinglong Wu

We propose Layer Decomposition of Graphic Designs (LDGD), a novel vision task that converts composite graphic design (e.g., posters) into structured representations comprising ordered RGB-A layers and metadata. By transforming visual content into structured data, LDGD facilitates precise image editing and offers significant advantages for digital content creation, management, and reuse. This task presents two core challenges: (1) predicting the attribute information (metadata) of each layer, and (2) recovering the occluded regions within overlapping layers to enable high-fidelity image reconstruction. To address this, we present the Decompose Layer Model (DeaM), a large unified multimodal model that integrates a conjoined visual encoder, a language model, and a condition-aware RGB-A decoder. DeaM adopts a two-stage processing pipeline: first generates layer-specific metadata containing information such as spatial coordinates and quantized encodings, and then reconstructs pixel-accurate layer images using a condition-aware RGB-A decoder. Beyond full decomposition, the model supports interactive decomposition via textual or point-based prompts. Extensive experiments demonstrate the effectiveness of the proposed method. The code is accessed at https://github.com/witnessai/DeaM.

Subject: ICML.2025 - Poster

7SG4s8d8AQ@OpenReview

#1 Decomposition of Graphic Design with Unified Multimodal Model [PDF2] [Copy] [Kimi1] [REL]

#1 Decomposition of Graphic Design with Unified Multimodal Model [PDF²] [Copy] [Kimi¹] [REL]