Emergent Extreme-View Geometry in 3D Foundation Models

#1 Emergent Extreme-View Geometry in 3D Foundation Models [PDF³] [Copy] [Kimi¹] [REL]

Authors: Yiwen Zhang, Joseph Tung, Ruojin Cai, David Fouhey, Hadar Averbuch-Elor

3D foundation models (3DFMs) have recently transformed 3D vision, enabling joint prediction of depths, poses, and point maps directly from images. Yet their ability to reason under extreme, non-overlapping views remains largely unexplored. In this work, we study their internal representations and find that 3DFMs exhibit an emergent understanding of extreme-view geometry, despite never being trained for such conditions. To further enhance these capabilities, we introduce a lightweight alignment scheme that refines their internal 3D representation by tuning only a small subset of backbone bias terms, leaving all decoder heads frozen. This targeted adaptation substantially improves relative pose estimation under extreme viewpoints without degrading per-image depth or point quality. Additionally, we contribute MegaUnScene, a new benchmark of Internet scenes unseen by existing 3DFMs, with dedicated test splits for both relative pose estimation and dense 3D reconstruction. All code and data will be released.

Subject: Computer Vision and Pattern Recognition

Publish: 2025-11-27 18:40:03 UTC

2511.22686

#1 Emergent Extreme-View Geometry in 3D Foundation Models [PDF3] [Copy] [Kimi1] [REL]

#1 Emergent Extreme-View Geometry in 3D Foundation Models [PDF³] [Copy] [Kimi¹] [REL]