PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization

#1 PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization [PDF²] [Copy] [Kimi³] [REL]

Authors: Edward Fish, Andrew Gilbert

Few-shot temporal action localization (TAL) methods that adapt large models via single-prompt tuning often fail to produce precise temporal boundaries. This stems from the model learning a non-discriminative mean representation of an action from sparse data, which compromises generalization. We address this by proposing a new paradigm based on multi-prompt ensembles, where a set of diverse, learnable prompts for each action is encouraged to specialize on compositional sub-events. To enforce this specialization, we introduce PLOT-TAL, a framework that leverages Optimal Transport (OT) to find a globally optimal alignment between the prompt ensemble and the video's temporal features. Our method establishes a new state-of-the-art on the challenging few-shot benchmarks of THUMOS'14 and EPIC-Kitchens, without requiring complex meta-learning. The significant performance gains, particularly at high IoU thresholds, validate our hypothesis and demonstrate the superiority of learning distributed, compositional representations for precise temporal localization.

Subjects: Computer Vision and Pattern Recognition , Machine Learning

Publish: 2024-03-27 18:08:14 UTC

2403.18915

#1 PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization [PDF2] [Copy] [Kimi3] [REL]

#1 PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization [PDF²] [Copy] [Kimi³] [REL]