3489@2024@ECCV

Total: 1

#1 Spherical World-Locking for Audio-Visual Localization in Egocentric Videos [PDF1] [Copy] [Kimi1] [REL]

Authors: Heeseung Yun, Ruohan Gao, Ishwarya Ananthabhotla, Anurag Kumar, Jacob Donley, Chao Li, Gunhee Kim, Vamsi Krishna Ithapu, Calvin Murdock

Egocentric videos provide comprehensive contexts for user and scene understanding, spanning multisensory perception to the wearer’s behaviors. We propose Spherical World-Locking (SWL) as a general framework for egocentric scene representation, which implicitly transforms multisensory streams with respect to measurements of the wearer’s head orientation. Compared to conventional head-locked egocentric representations with a 2D planar field-of-view, SWL effectively offsets challenges posed by self-motion, allowing for improved spatial synchronization between input modalities. Using a set of multisensory embeddings on a world-locked sphere, we design a unified encoder-decoder transformer architecture that preserves the spherical structure of the scene representation, without requiring expensive image-to-sphere projections. We evaluate the effectiveness of the proposed framework on multiple benchmark tasks for egocentric video understanding, including active speaker localization in noisy conversations, audio-based spherical sound source localization, and behavior anticipation in everyday activities.

Subject: ECCV.2024 - Poster