Lv_SpatialDreamer_Self-supervised_Stereo_Video_Synthesis_from_Monocular_Input@CVPR2025@CVF

Total: 1

#1 SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input [PDF1] [Copy] [Kimi1] [REL]

Authors: Zhen Lv, Yangqi Long, Congzhentao Huang, Cao Li, Chengfei Lv, Hao Ren, Dian Zheng

Stereo video synthesis from a monocular input is a demanding task in the fields of spatial computing and virtual reality. The main challenges of this task lie on the insufficiency of high-quality paired stereo videos for training and the difficulty of maintaining the spatio-temporal consistency between frames. Existing methods mainly handle these problems by directly applying novel view synthesis $(\textbf{NVS})$ methods to video, which is naturally unsuitable. In this paper, we introduce a novel self-supervised stereo video synthesis paradigm via a video diffusion model, termed $\textbf{SpatialDreamer}$, which meets the challenges head-on. Firstly, to address the stereo video data insufficiency, we propose a $\textbf{D}$epth based $\textbf{V}$ideo $\textbf{G}$eneration module $\textbf{DVG}$, which employs a forward-backward rendering mechanism to generate paired videos with geometric and temporal priors. Leveraging data generated by DVG, we propose RefinerNet along with a self-supervised synthetic framework designed to facilitate efficient and dedicated training.More importantly, we devise a consistency control module, which consists of a metric of stereo deviation strength and a $\textbf{T}$emporal $\textbf{I}$nteraction $\textbf{L}$earning module $\textbf{TIL}$ for geometric and temporal consistency ensurance respectively. We evaluated the proposed method against various benchmark methods, with the results showcasing its superior performance.

Subject: CVPR.2025 - Poster