Understanding Model Reprogramming for CLIP via Decoupling Visual Prompts

#1 Understanding Model Reprogramming for CLIP via Decoupling Visual Prompts [PDF] [Copy] [Kimi¹] [REL]

Authors: Chengyi Cai, Zesheng Ye, Lei Feng, Jianzhong Qi, Feng Liu

Model reprogramming adapts pretrained models to downstream tasks by modifying only the input and output spaces. *Visual reprogramming* (VR) is one instance for vision tasks that adds a trainable noise pattern (i.e., a visual prompt) to input images to facilitate downstream classification. The existing VR approaches for CLIP train a single visual prompt using all descriptions of different downstream classes. However, the limited learning capacity may result in (1) a failure to capture diverse aspects of the descriptions (e.g., shape, color, and texture), and (2) a possible bias toward less informative attributes that do not help distinguish between classes. In this paper, we introduce a decoupling-and-reweighting framework. Our *decoupled visual prompts* (DVP) are optimized using descriptions grouped by explicit **c**au**se**s (DVP-cse) or unsupervised **cl**u**s**ters (DVP-cls). Then, we integrate the outputs of these visual prompts with a *probabilistic reweighting matrix* (PRM) that measures their contributions to each downstream class. Theoretically, DVP lowers the empirical risk bound. Experimentally, DVP outperforms baselines on average across 11 downstream datasets. Notably, the DVP-PRM integration enables insights into how individual visual prompts influence classification decisions, providing a probabilistic framework for understanding reprogramming.

Subject: ICML.2025 - Poster

Ne5brB1tKN@OpenReview

#1 Understanding Model Reprogramming for CLIP via Decoupling Visual Prompts [PDF] [Copy] [Kimi1] [REL]

#1 Understanding Model Reprogramming for CLIP via Decoupling Visual Prompts [PDF] [Copy] [Kimi¹] [REL]