Probabilistic Prototype Calibration of Vision-Language Models for Generalized Few-shot Semantic Segmentation

#1 Probabilistic Prototype Calibration of Vision-Language Models for Generalized Few-shot Semantic Segmentation [PDF⁵] [Copy] [Kimi¹] [REL]

Authors: Jie Liu, Jiayi Shen, Pan Zhou, Jan-Jakob Sonke, Efstratios Gavves

Generalized Few-Shot Semantic Segmentation (GFSS) aims to extend a segmentation model to novel classes with only a few annotated examples while maintaining performance on base classes. Recently, pretrained vision-language models (VLMs) such as CLIP have been leveraged in GFSS to improve generalization on novel classes through multi-modal prototypes learning. However, existing prototype-based methods are inherently deterministic, limiting the adaptability of learned prototypes to diverse samples, particularly for novel classes with scarce annotations. To address this, we propose FewCLIP, a probabilistic prototype calibration framework over multi-modal prototypes from the pretrained CLIP, thus providing more adaptive prototype learning for GFSS. Specifically, FewCLIP first introduces a prototype calibration mechanism, which refines frozen textual prototypes with learnable visual calibration prototypes, leading to a more discriminative and adaptive representation. Furthermore, unlike deterministic prototype learning techniques, FewCLIP introduces distribution regularization over these calibration prototypes. This probabilistic formulation ensures structured and uncertainty-aware prototype learning, effectively mitigating overfitting to limited novel class data while enhancing generalization. Extensive experimental results on PASCAL-5 $^i$ and COCO-20 $^i$ datasets demonstrate that our proposed FewCLIP significantly outperforms state-of-the-art approaches across both GFSS and class-incremental setting. The code is available at https://github.com/jliu4ai/FewCLIP.

Subject: Computer Vision and Pattern Recognition

Publish: 2025-06-28 18:36:22 UTC

2506.22979

#1 Probabilistic Prototype Calibration of Vision-Language Models for Generalized Few-shot Semantic Segmentation [PDF5] [Copy] [Kimi1] [REL]

#1 Probabilistic Prototype Calibration of Vision-Language Models for Generalized Few-shot Semantic Segmentation [PDF⁵] [Copy] [Kimi¹] [REL]