qEVxub0h2a@OpenReview

Total: 1

#1 OmniAudio: Generating Spatial Audio from 360-Degree Video [PDF1] [Copy] [Kimi] [REL]

Authors: Huadai Liu, Tianyi Luo, Kaicheng Luo, Qikai Jiang, Peiwen Sun, Jialei Wang, Rongjie Huang, Qian Chen, Wen Wang, Xiangtai Li, ShiLiang Zhang, Zhijie Yan, Zhou Zhao, Wei Xue

Traditional video-to-audio generation techniques primarily focus on perspective video and non-spatial audio, often missing the spatial cues necessary for accurately representing sound sources in 3D environments. To address this limitation, we introduce a novel task, \textbf{360V2SA}, to generate spatial audio from 360-degree videos, specifically producing First-order Ambisonics (FOA) audio - a standard format for representing 3D spatial audio that captures sound directionality and enables realistic 3D audio reproduction. We first create \textbf{Sphere360}, a novel dataset tailored for this task that is curated from real-world data. We also design an efficient semi-automated pipeline for collecting and cleaning paired video-audio data. To generate spatial audio from 360-degree video, we propose a novel framework \textbf{OmniAudio}, which leverages self-supervised pre-training using both spatial audio data (in FOA format) and large-scale non-spatial data. Furthermore, OmniAudio features a dual-branch framework that utilizes both panoramic and perspective video inputs to capture comprehensive local and global information from 360-degree videos. Experimental results demonstrate that OmniAudio achieves state-of-the-art performance across both objective and subjective metrics on Sphere360. Code and datasets are available at~\href{https://github.com/liuhuadai/OmniAudio}{\texttt{github.com/liuhuadai/OmniAudio}}. The project website is available at \href{https://OmniAudio-360V2SA.github.io}{\texttt{OmniAudio-360V2SA.github.io}}.

Subject: ICML.2025 - Poster