Total: 1
Recovering missing modalities in multimodal learning has recently been approached using diffusion models to synthesize absent data conditioned on available modalities. However, existing methods often suffer from modality generation bias: while certain modalities are generated with high fidelity, others--such as video--remain challenging due to intrinsic modality gaps, leading to imbalanced training. To address this issue, we propose MD^2N (Multi-stage Duplex Diffusion Network), a novel framework for unbiased missing-modality recovery. MD^2N introduces a modality transfer module within a duplex diffusion architecture, enabling bidirectional generation between available and missing modalities through three stages: (1) global structure generation, (2) modality transfer, and (3) local cross-modal refinement. By training with duplex diffusion, both available and missing modalities generate each other in an intersecting manner, effectively achieving a balanced generation state.Extensive experiments demonstrate that MD^2N significantly outperforms existing state-of-the-art methods, achieving up to 4% improvement over IMDer on the CMU-MOSEI dataset. Project page: https://crystal-punk.github.io/.