ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding

Ji_ARKit_LabelMaker_A_New_Scale_for_Indoor_3D_Scene_Understanding@CVPR2025@CVF

Total: 1

#1 ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding [PDF] [Copy] [Kimi] [REL]

Authors: Guangda Ji, Silvan Weder, Francis Engelmann, Marc Pollefeys, Hermann Blum

Neural network performance scales with both model size and data volume, as shown in language and image processing. This requires scaling-friendly architectures and large datasets. While transformers have been adapted for 3D vision, a 'GPT-moment' remains elusive due to limited training data. We introduce ARKit LabelMaker, the first large-scale, real-world 3D dataset with dense semantic annotations. Specifically, we enhance ARKitScenes with automatically generated dense labels using an extended LabelMaker pipeline, tailored for large-scale pre-training. Training on this dataset improves accuracy across architectures, achieving state-of-the-art results on ScanNet and ScanNet200, with notable gains on tail classes. We compare our results with self-supervised methods and synthetic data, evaluating the effects on downstream tasks and zero-shot generalization. The dataset will be publicly available.

Subject: CVPR.2025 - Poster