Video Models Start to Solve Chess, Maze, Sudoku, Mental Rotation, and Raven' Matrices

#1 Video Models Start to Solve Chess, Maze, Sudoku, Mental Rotation, and Raven' Matrices [PDF¹] [Copy] [Kimi¹] [REL]

We show that video generation models could reason now. Testing on tasks such as chess, maze, Sudoku, mental rotation, and Raven's Matrices, leading models such as Sora-2 achieve sixty percent success rates. We establish a robust experimental paradigm centered on the "Task Pair" design. We build a code framework, with 39 models available already, that supports this paradigm and allows for easy scaling - users can add models and tasks efficiently. We show our automated evaluation strongly correlates with human judgment, and therefore this paradigm is highly scalable. We see an opportunity, given the availability of our paradigm, to do reinforcement learning for improving reasoning in video models. You could checkout all of our raw $\href{https://grow-ai-like-a-child.com/video-reason/}{results}$ and our $\href{https://github.com/hokindeng/VMEvalKit}{VMEvalKit}$ codebase.

Subjects: Computer Vision and Pattern Recognition , Artificial Intelligence

Publish: 2025-11-02 01:22:29 UTC

2512.05969

#1 Video Models Start to Solve Chess, Maze, Sudoku, Mental Rotation, and Raven' Matrices [PDF1] [Copy] [Kimi1] [REL]

#1 Video Models Start to Solve Chess, Maze, Sudoku, Mental Rotation, and Raven' Matrices [PDF¹] [Copy] [Kimi¹] [REL]