Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation

#1 Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Volodymyr Havrylov, Haiwen Huang, Dan Zhang, Andreas Geiger

Vision Foundation Models (VFMs) are large-scale, pre-trained models that serve as general-purpose backbones for various computer vision tasks. As VFMs' popularity grows, there is an increasing interest in understanding their effectiveness for dense prediction tasks. However, VFMs typically produce low-resolution features, limiting their direct applicability in this context. One way to tackle this limitation is by employing a task-agnostic feature upsampling module that refines VFM features resolution. To assess the effectiveness of this approach, we investigate Interactive Segmentation (IS) as a novel benchmark for evaluating feature upsampling methods on VFMs. Due to its inherent multimodal input, consisting of an image and a set of user-defined clicks, as well as its dense mask output, IS creates a challenging environment that demands comprehensive visual scene understanding. Our benchmarking experiments show that selecting appropriate upsampling strategies significantly improves VFM features quality. The code is released at https://github.com/havrylovv/iSegProbe

Subjects: Computer Vision and Pattern Recognition , Artificial Intelligence , Machine Learning

Publish: 2025-05-04 11:59:26 UTC

2505.02075

#1 Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation [PDF1] [Copy] [Kimi1] [REL]

#1 Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation [PDF¹] [Copy] [Kimi¹] [REL]