Spatially-aware Weights Tokenization for NeRF-Language Models

#1 Spatially-aware Weights Tokenization for NeRF-Language Models [PDF] [Copy] [Kimi] [REL]

Authors: Andrea Amaduzzi, Pierluigi Zama Ramirez, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano

Neural Radiance Fields (NeRFs) are neural networks -- typically multilayer perceptrons (MLPs) -- that represent the geometry and appearance of objects, with applications in vision, graphics, and robotics. Recent works propose understanding NeRFs with natural language using Multimodal Large Language Models (MLLMs) that directly process the weights of a NeRF's MLP. However, these approaches rely on a global representation of the input object, making them unsuitable for spatial reasoning and fine-grained understanding. In contrast, we propose **weights2space**, a self-supervised framework featuring a novel meta-encoder that can compute a sequence of spatial tokens directly from the weights of a NeRF. Leveraging this representation, we build **Spatial LLaNA**, a novel MLLM for NeRFs, capable of understanding details and spatial relationships in objects represented as NeRFs. We evaluate Spatial LLaNA on NeRF captioning and NeRF Q&A tasks, using both existing benchmarks and our novel **Spatial ObjaNeRF** dataset consisting of $100$ manually-curated language annotations for NeRFs. This dataset features 3D models and descriptions that challenge the spatial reasoning capability of MLLMs. Spatial LLaNA outperforms existing approaches across all tasks.

Subject: NeurIPS.2025 - Poster

z9MxyboJ7R@OpenReview

#1 Spatially-aware Weights Tokenization for NeRF-Language Models [PDF] [Copy] [Kimi] [REL]