ZigMa: A DiT-style Zigzag Mamba Diffusion Model

#1 ZigMa: A DiT-style Zigzag Mamba Diffusion Model [PDF²] [Copy] [Kimi²] [REL]

Authors: Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Schusterbauer-Fischer, Bjorn Ommer

The diffusion model has long been plagued by scalability and quadratic complexity issues, especially within transformer-based structures. In this study, we aim to leverage the long sequence modeling capability of a State-Space Model called Mamba to extend its applicability to visual data generation. Firstly, we identify a critical oversight in most current Mamba-based vision methods, namely the lack of consideration for spatial continuity in the scan scheme of Mamba. Secondly, building upon this insight, we introduce a simple, plug-and-play, zero-parameter method named Zigzag Mamba, which outperforms Mamba-based baselines and demonstrates improved speed and memory utilization compared to transformer-based baselines. Lastly, we integrate Zigzag Mamba with the Stochastic Interpolant framework to investigate the scalability of the model on large-resolution visual datasets, such as FacesHQ $1024\times 1024$ and UCF101, MultiModal-CelebA-HQ, and MS COCO $256\times 256$. Code will be available at https://anonymous.4open.science/r/sim_anonymous-4C27/README.md

Subject: ECCV.2024 - Poster

3141@2024@ECCV

#1 ZigMa: A DiT-style Zigzag Mamba Diffusion Model [PDF2] [Copy] [Kimi2] [REL]

#1 ZigMa: A DiT-style Zigzag Mamba Diffusion Model [PDF²] [Copy] [Kimi²] [REL]