2412.18386

Total: 1

#1 Switch-a-View: Few-Shot View Selection Learned from Edited Videos [PDF6] [Copy] [Kimi1] [REL]

Authors: Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah, Kristen Grauman

We introduce Switch-a-View, a model that learns to automatically select the viewpoint to display at each timepoint when creating a how-to video. The key insight of our approach is how to train such a model from unlabeled--but human-edited--video samples. We pose a pretext task that pseudo-labels segments in the training videos for their primary viewpoint (egocentric or exocentric), and then discovers the patterns between those view-switch moments on the one hand and the visual and spoken content in the how-to video on the other hand. Armed with this predictor, our model then takes an unseen multi-view video as input and orchestrates which viewpoint should be displayed when. We further introduce a few-shot training setting that permits steering the model towards a new data domain. We demonstrate our idea on a variety of real-world video from HowTo100M and Ego-Exo4D and rigorously validate its advantages.

Subject: Computer Vision and Pattern Recognition

Publish: 2024-12-24 12:16:43 UTC