Total: 1
Model reprogramming adapts pretrained models to downstream tasks by modifying only the input and output spaces. *Visual reprogramming* (VR) is one instance for vision tasks that adds a trainable noise pattern (i.e., a visual prompt) to input images to facilitate downstream classification. The existing VR approaches for CLIP train a single visual prompt using all descriptions of different downstream classes. However, the limited learning capacity may result in (1) a failure to capture diverse aspects of the descriptions (e.g., shape, color, and texture), and (2) a possible bias toward less informative attributes that do not help distinguish between classes. In this paper, we introduce a decoupling-and-reweighting framework. Our *decoupled visual prompts* (DVP) are optimized using descriptions grouped by explicit **c**au**se**s (DVP-cse) or unsupervised **cl**u**s**ters (DVP-cls). Then, we integrate the outputs of these visual prompts with a *probabilistic reweighting matrix* (PRM) that measures their contributions to each downstream class. Theoretically, DVP lowers the empirical risk bound. Experimentally, DVP outperforms baselines on average across 11 downstream datasets. Notably, the DVP-PRM integration enables insights into how individual visual prompts influence classification decisions, providing a probabilistic framework for understanding reprogramming.