fGuTN7huo5@OpenReview

Total: 1

#1 Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals [PDF5] [Copy] [Kimi4] [REL]

Authors: Stefan Stojanov, David Wendt, Seungwoo Kim, Rahul Mysore Venkatesh, Kevin Feigelis, Klemen Kotar, Khai Loong Aw, Jiajun Wu, Daniel LK Yamins

Estimating motion primitives from video (e.g., optical flow and occlusion) is a critically important computer vision problem with many downstream applications, including controllable video generation and robotics. Current solutions are primarily supervised on synthetic data or require tuning of situation-specific heuristics, which inherently limits these models' capabilities in real-world contexts. A natural solution to transcend these limitations would be to deploy large-scale, self-supervised video models, which can be trained scalably on unrestricted real-world video datasets. However, despite recent progress, motion-primitive extraction from large pretrained video models remains relatively underexplored. In this work, we describe Opt-CWM, a self-supervised flow and occlusion estimation technique from a pretrained video prediction model. Opt-CWM uses ``counterfactual probes'' to extract motion information from a base video model in a zero-shot fashion. The key problem we solve is optimizing the quality of these probes, using a combination of an efficient parameterization of the space counterfactual probes, together with a novel generic sparse-prediction principle for learning the probe-generation parameters in a self-supervised fashion. Opt-CWM achieves state-of-the-art performance for motion estimation on real-world videos while requiring no labeled data.

Subject: NeurIPS.2025 - Spotlight