When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective

2409.01821

Total: 1

#1 When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective [PDF³] [Copy] [Kimi²] [REL]

Authors: Hsi-Ai Tsao, Lei Hsiung, Pin-Yu Chen, Tsung-Yi Ho

Adapting pre-trained models to new tasks can exhibit varying effectiveness across datasets. Visual prompting, a state-of-the-art parameter-efficient transfer learning method, can significantly improve the performance of out-of-distribution tasks. On the other hand, linear probing, a standard transfer learning method, can sometimes become the best approach. We propose a log-likelihood ratio (LLR) approach to analyze the comparative benefits of visual prompting and linear probing. By employing the LLR score alongside resource-efficient visual prompts approximations, our cost-effective measure attains up to a 100-fold reduction in run time compared to full training, while achieving prediction accuracies up to 91%. The source code is available at https://github.com/IBM/VP-LLR.

Subjects: Computer Vision and Pattern Recognition , Machine Learning

Publish: 2024-09-03 12:03:45 UTC

2409.01821

#1 When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective [PDF3] [Copy] [Kimi2] [REL]

#1 When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective [PDF³] [Copy] [Kimi²] [REL]