Chen_STDDNet_Harnessing_Mamba_for_Video_Polyp_Segmentation_via_Spatial-aligned_Temporal@ICCV2025@CVF

Total: 1

#1 STDDNet: Harnessing Mamba for Video Polyp Segmentation via Spatial-aligned Temporal Modeling and Discriminative Dynamic Representation Learning [PDF1] [Copy] [Kimi] [REL]

Authors: Guilian Chen, Huisi Wu, Jing Qin

Automated segmentation of polyps from colonoscopy videos is of great clinical significance as it can assist clinicians in making accurate diagnoses and precise interventions. However, video polyp segmentation (VPS) is challenging due to ambiguous polyp boundaries, as well as variations in polyp scale, contrast, and position across consecutive frames. Moreover, to meet clinical requirements, the inference must operate in real-time to enable intraoperative tracking and guidance. In this paper, we propose a novel and efficient segmentation network, STDDNet, which integrates a spatial-aligned temporal modeling strategy and a discriminative dynamic representation learning mechanism, to comprehensively address these challenges by harnessing the advantages of Mamba. Specifically, a spatial-aligned temporal dependency propagation (STDP) module is developed to model temporal consistency from the consecutive frames based on a bidirectional scanning Mamba block. Furthermore, we design a discriminative dynamic feature extraction (DDFE) module to explore frame-wise dynamic information from the structural feature generated by the Mamba block. Such dynamic features can effectively deal with the variations across colonoscopy frames, providing more details for refined segmentation. We extensively evaluate STDDNet on two benchmark datasets, SUN-SEG and CVC-ClinicDB, demonstrating superior segmentation performance compared to state-of-the-art methods while maintaining real-time inference. Codes are available at https://github.com/C-GLGLGL/STDDNet.

Subject: ICCV.2025 - Poster