TGSFormer: Scalable Temporal Gaussian Splatting for Embodied Semantic Scene Completion

#1 TGSFormer: Scalable Temporal Gaussian Splatting for Embodied Semantic Scene Completion [PDF⁵] [Copy] [Kimi] [REL]

Authors: Rui Qian, Haozhi Cao, Tianchen Deng, Tianxin Hu, Weixiang Guo, Shenghai Yuan, Lihua Xie

Embodied 3D Semantic Scene Completion (SSC) infers dense geometry and semantics from continuous egocentric observations. Most existing Gaussian-based methods rely on random initialization of many primitives within predefined spatial bounds, resulting in redundancy and poor scalability to unbounded scenes. Recent depth-guided approach alleviates this issue but remains local, suffering from latency and memory overhead as scale increases. To overcome these challenges, we propose TGSFormer, a scalable Temporal Gaussian Splatting framework for embodied SSC. It maintains a persistent Gaussian memory for temporal prediction, without relying on image coherence or frame caches. For temporal fusion, a Dual Temporal Encoder jointly processes current and historical Gaussian features through confidence-aware cross-attention. Subsequently, a Confidence-aware Voxel Fusion module merges overlapping primitives into voxel-aligned representations, regulating density and maintaining compactness. Extensive experiments demonstrate that TGSFormer achieves state-of-the-art results on both local and embodied SSC benchmarks, offering superior accuracy and scalability with significantly fewer primitives while maintaining consistent long-term scene integrity. The code will be released upon acceptance.

Subject: Computer Vision and Pattern Recognition

Publish: 2025-11-29 03:47:14 UTC

2512.00300

#1 TGSFormer: Scalable Temporal Gaussian Splatting for Embodied Semantic Scene Completion [PDF5] [Copy] [Kimi] [REL]

#1 TGSFormer: Scalable Temporal Gaussian Splatting for Embodied Semantic Scene Completion [PDF⁵] [Copy] [Kimi] [REL]