Zhou_When_Pixel_Difference_Patterns_Meet_ViT_PiDiViT_for_Few-Shot_Object@ICCV2025@CVF

Total: 1

#1 When Pixel Difference Patterns Meet ViT: PiDiViT for Few-Shot Object Detection [PDF3] [Copy] [Kimi] [REL]

Authors: Hongliang Zhou, Yongxiang Liu, Canyu Mo, Weijie Li, Bowen Peng, Li Liu

Few-shot object detection aims to detect novel classes with limited samples. Recent methods have leveraged the rich semantic representations of pretrained vision transformer (ViT) to overcome the limitations of model fine-tuning, thereby improving the performance on novel classes. However, existing pretrained ViT schemes only perform transformer encoding in the feature dimension, ignoring the exploration of pixel-wise differences in low-level features and multiscale variations. The current challenges lie in: (i) the extracted features suffer from blurred boundary features and smooth transition from center to boundary, leading to insufficient distinction between objects and backgrounds, and (ii) how to balance the extraction of local details and global contour features under multiscale scenarios. So Pixel Difference Vision Transformer (PiDiViT) is proposed. Innovations include: (i) difference convolution fusion module (DCFM), which enhances the feature differences from object centers to boundaries and effectively preserves global information by fusing pixel-wise central difference features with original features through an attention mechanism, and (ii) multiscale feature fusion module (MFFM), which adaptively fuses features extracted by five different scale convolutional kernels using a scale attention mechanism to generate attention weights, achieving an optimal balance between local detail and global semantic information extraction. PiDiViT achieves SOTA on COCO benchmark: surpassing few-shot detection SOTA by 2.7 nAP50 (10-shot) and 4.0 nAP50 (30-shot) for novel classes, exceeding one-shot detection SOTA by 4.4 nAP50 and open-vocabulary detection SOTA by 3.7 nAP50. The code is available at https://github.com/Seaz9/PiDiViT.

Subject: ICCV.2025 - Poster