DICE: Staleness-Centric Optimizations for Parallel Diffusion MoE Inference

#1 DICE: Staleness-Centric Optimizations for Parallel Diffusion MoE Inference [PDF¹] [Copy] [Kimi] [REL]

Authors: Jiajun Luo, Lizhuo Luo, Jianru Xu, Jiajun Song, Rongwei Lu, Chen Tang, Zhi Wang

Mixture-of-Experts-based (MoE-based) diffusion models demonstrate remarkable scalability in high-fidelity image generation, yet their reliance on expert parallelism introduces critical communication bottlenecks. State-of-the-art methods alleviate such overhead in parallel diffusion inference through computation-communication overlapping, termed displaced parallelism. However, we identify that these techniques induce severe staleness --- the usage of outdated activations from previous timesteps that significantly degrades quality, especially in expert-parallel scenarios. We tackle this fundamental tension and propose DICE, a staleness-centric optimization framework with a three-fold approach: (1) Interweaved Parallelism introduces staggered pipelines, effectively halving step-level staleness for free; (2) Selective Synchronization operates at layer-level and protects layers vulnerable to staled activations; and (3) Conditional Communication, a token-level, training-free method that dynamically adjusts communication frequency based on token importance. Together, these strategies effectively reduce staleness, achieving 1.26x speedup with minimal quality degradation. Empirical results establish DICE as an effective and scalable solution. Our code is available at https://github.com/Cobalt-27/DICE

Subject: ICCV.2025 - Poster

Luo_DICE_Staleness-Centric_Optimizations_for_Parallel_Diffusion_MoE_Inference@ICCV2025@CVF

#1 DICE: Staleness-Centric Optimizations for Parallel Diffusion MoE Inference [PDF1] [Copy] [Kimi] [REL]

#1 DICE: Staleness-Centric Optimizations for Parallel Diffusion MoE Inference [PDF¹] [Copy] [Kimi] [REL]