Total: 1
Moving object segmentation (MOS) on LiDAR point clouds is crucial for autonomous systems such as self-driving vehicles. While previous supervised approaches rely on costly manual annotations, LiDAR sequences naturally capture temporal motion cues that can be leveraged for self-supervised learning. In this paper, we propose Temporal Overlapping Prediction (TOP), a self-supervised pre-training method designed to alleviate this annotation burden. TOP learns powerful spatiotemporal representations by predicting the occupancy states of temporal overlapping points that are commonly observed in current and adjacent scans. To further ground these representations in the current scene's geometry, we introduce an auxiliary pre-training objective of reconstructing the occupancy of the current scan. Extensive experiments on the nuScenes and SemanticKITTI datasets validate our method's effectiveness. TOP consistently outperforms existing supervised and self-supervised pre-training baselines across both point-level Intersection-over-Union (IoU) and object-level Recall metrics. Notably, it achieves a relative improvement of up to 28.77% over a training-from-scratch baseline and demonstrates strong transferability across LiDAR setups. Our code is publicly available at https://github.com/ZiliangMiao/TOP.