2505.12861

Total: 1

#1 RMMSS: Towards Advanced Robust Multi-Modal Semantic Segmentation with Hybrid Prototype Distillation and Feature Selection [PDF] [Copy] [Kimi] [REL]

Authors: Jiaqi Tan, Xu Zheng, Yang Liu

Multi-modal semantic segmentation (MMSS) faces significant challenges in real-world applications due to incomplete, degraded, or missing sensor data. While current MMSS methods typically use self-distillation with modality dropout to improve robustness, they largely overlook inter-modal correlations and thus suffer significant performance degradation when no modalities are missing. To this end, we present RMMSS, a two-stage framework designed to progressively enhance model robustness under missing-modality conditions, while maintaining strong performance in full-modality scenarios. It comprises two key components: the Hybrid Prototype Distillation Module (HPDM) and the Feature Selection Module (FSM). In the first stage, we pre-train the teacher model with full-modality data and then introduce HPDM to do cross-modal knowledge distillation for obtaining a highly robust model. In the second stage, we freeze both the pre-trained full-modality teacher model and the robust model and propose a trainable FSM that extracts optimal representations from both the feature and logits layers of the models via feature score calculation. This process learns a final student model that maintains strong robustness while achieving high performance under full-modality conditions. Our experiments on three datasets demonstrate that our method improves missing-modality performance by 2.80%, 3.89%, and 0.89%, respectively, compared to the state-of-the-art, while causing almost no drop in full-modality performance (only -0.1% mIoU). Meanwhile, different backbones (AnySeg and CMNeXt) are utilized to validate the generalizability of our framework.

Subject: Computer Vision and Pattern Recognition

Publish: 2025-05-19 08:46:03 UTC