4236@2024@ECCV

Total: 1

#1 Decoupling Common and Unique Representations for Multimodal Self-supervised Learning [PDF6] [Copy] [Kimi15] [REL]

Authors: Yi Wang, Conrad M Albrecht, Nassim Ait Ali Braham, Chenying Liu, Zhitong Xiong, Xiao Xiang Zhu

The increasing availability of multi-sensor data sparks interest in multimodal self-supervised learning. However, most existing approaches learn only common representations across modalities while ignoring intra-modal training and modality-unique representations. We propose Decoupling Common and Unique Representations (DeCUR), a simple yet effective method for multimodal self-supervised learning. By distinguishing inter- and intra-modal embeddings through multimodal redundancy reduction, DeCUR can integrate complementary information across different modalities. Meanwhile, a simple residual deformable attention is introduced to help the model focus on modality-informative features. We evaluate DeCUR in three common multimodal scenarios ( radar-optical, RGB-elevation, and RGB-depth), and demonstrate its consistent and significant improvement for both multimodal and modality-missing settings. With thorough experiments and comprehensive analysis, we hope this work can provide insights and raise more interest in researching the hidden relationships of multimodal representations.

Subject: ECCV.2024 - Oral