26823@AAAI

Total: 1

#1 Towards Unified, Explainable, and Robust Multisensory Perception [PDF] [Copy] [Kimi]

Author: Yapeng Tian

Humans perceive surrounding scenes through multiple senses with multisensory integration. For example, hearing helps capture the spatial location of a racing car behind us; seeing peoples' talking faces can strengthen our perception of their speech. However, today's state-of-the-art scene understanding systems are usually designed to rely on a single audio or visual modality. Ignoring multisensory cooperation has become one of the key bottlenecks in creating intelligent systems with human-level perception capability, which impedes the real-world applications of existing scene understanding models. To address this limitation, my research has pioneered marrying computer vision with computer audition to create multimodal systems that can learn to understand audio and visual data. In particular, my current research focuses on asking and solving fundamental problems in a fresh research area: audio-visual scene understanding and strives to develop unified, explainable, and robust multisensory perception machines. The three themes are distinct yet interconnected, and all of them are essential for designing powerful and trustworthy perception systems. In my talk, I will give a brief overview about this new research area and then introduce my works in the three research thrusts.