Zhang_POMATO_Marrying_Pointmap_Matching_with_Temporal_Motions_for_Dynamic_3D@ICCV2025@CVF

Total: 1

#1 POMATO: Marrying Pointmap Matching with Temporal Motions for Dynamic 3D Reconstruction [PDF] [Copy] [Kimi] [REL]

Authors: Songyan Zhang, Yongtao Ge, Jinyuan Tian, Guangkai Xu, Hao Chen, Chen Lv, Chunhua Shen

Recent approaches to 3D reconstruction in dynamic scenes primarily rely on the integration of separate geometry estimation and matching modules, where the latter plays a critical role in distinguishing dynamic regions and mitigating the interference caused by moving objects. Furthermore, the matching module explicitly models object motion, enabling the tracking of specific targets and advancing motion understanding in complex scenarios. Recently, the proposed representation of pointmap in DUSt3R suggests a potential solution to unify both geometry estimation and matching in 3D space, effectively reducing computational overhead by eliminating the need for redundant auxiliary modules. However, it still struggles with ambiguous correspondences in dynamic regions, which limits reconstruction performance in such scenarios. In this work, we present POMATO, a unified framework for dynamic 3D reconstruction by marrying POintmap MAtching with Temporal mOtion. Specifically, our method first learns an explicit matching relationship by mapping RGB pixels across different views to 3D pointmaps within a unified coordinate system. Furthermore, we introduce a temporal motion module for dynamic motions that ensures scale consistency across different frames and enhances performance in 3D reconstruction tasks requiring both precise geometry and reliable matching, most notably 3D point tracking. We show the effectiveness of our proposed POMATO by demonstrating the remarkable performance across multiple downstream tasks, including video depth estimation, 3D point tracking, and pose estimation. Code and models are publicly available at https://github.com/wyddmw/POMATO.

Subject: ICCV.2025 - Poster