Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning

#1 Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning [PDF] [Copy] [Kimi] [REL]

Authors: Xue Zhao, Qinying Gu, Xinbing Wang, Chenghu Zhou, Nanyang Ye

Improving the generalization of multi-camera 3D object detection is essential for safe autonomous driving in the real world. In this paper, we consider a realistic yet more challenging scenario, which aims to improve the generalization when only single source data available for training, as gathering diverse domains of data and collecting annotations is time-consuming and labor-intensive. To this end, we propose the Fourier Cross-View Learning (FCVL) framework including Fourier Hierarchical Augmentation (FHiAug), an augmentation strategy in the frequency domain to boost domain diversity, and Fourier Cross-View Semantic Consistency Loss to facilitate the model to learn more domain-invariant features from adjacent perspectives. Furthermore, we provide theoretical guarantees via augmentation graph theory. To the best of our knowledge, this is the first study to explore generalizable multi-camera 3D object detection with a single source. Extensive experiments on various testing domains have demonstrated that our approach achieves the best performance across various domain generalization methods.

Subject: ICML.2025 - Poster

R6ORNPrIdv@OpenReview

#1 Generalizable Multi-Camera 3D Object Detection from a Single Source via Fourier Cross-View Learning [PDF] [Copy] [Kimi] [REL]