Unsupervised Skeleton-Based Action Segmentation via Hierarchical Spatiotemporal Vector Quantization

#1 Unsupervised Skeleton-Based Action Segmentation via Hierarchical Spatiotemporal Vector Quantization [PDF²] [Copy] [Kimi¹] [REL]

Authors: Umer Ahmed, Syed Ahmed Mahmood, Fawad Javed Fateh, M. Shaheer Luqman, M. Zeeshan Zia, Quoc-Huy Tran

We propose a novel hierarchical spatiotemporal vector quantization framework for unsupervised skeleton-based temporal action segmentation. We first introduce a hierarchical approach, which includes two consecutive levels of vector quantization. Specifically, the lower level associates skeletons with fine-grained subactions, while the higher level further aggregates subactions into action-level representations. Our hierarchical approach outperforms the non-hierarchical baseline, while primarily exploiting spatial cues by reconstructing input skeletons. Next, we extend our approach by leveraging both spatial and temporal information, yielding a hierarchical spatiotemporal vector quantization scheme. In particular, our hierarchical spatiotemporal approach performs multi-level clustering, while simultaneously recovering input skeletons and their corresponding timestamps. Lastly, extensive experiments on multiple benchmarks, including HuGaDB, LARa, and BABEL, demonstrate that our approach establishes a new state-of-the-art performance and reduces segment length bias in unsupervised skeleton-based temporal action segmentation.

Subject: Computer Vision and Pattern Recognition

Publish: 2026-04-16 16:24:40 UTC

2604.15196

#1 Unsupervised Skeleton-Based Action Segmentation via Hierarchical Spatiotemporal Vector Quantization [PDF2] [Copy] [Kimi1] [REL]

#1 Unsupervised Skeleton-Based Action Segmentation via Hierarchical Spatiotemporal Vector Quantization [PDF²] [Copy] [Kimi¹] [REL]