M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving

#1 M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving [PDF⁹] [Copy] [Kimi¹⁰] [REL]

Authors: Dongyang Xu, Haokun Li, Qingfan Wang, Ziying Song, Lei Chen, Hanming Deng

End-to-end autonomous driving has witnessed remarkable progress. However, the extensive deployment of autonomous vehicles has yet to be realized, primarily due to 1) inefficient multi-modal environment perception: how to integrate data from multi-modal sensors more efficiently; 2) non-human-like scene understanding: how to effectively locate and predict critical risky agents in traffic scenarios like an experienced driver. To overcome these challenges, in this paper, we propose a Multi-Modal fusion transformer incorporating Driver Attention (M2DA) for autonomous driving. To better fuse multi-modal data and achieve higher alignment between different modalities, a novel Lidar-Vision-Attention-based Fusion (LVAFusion) module is proposed. By incorporating driver attention, we empower the human-like scene understanding ability to autonomous vehicles to identify crucial areas within complex scenarios precisely and ensure safety. We conduct experiments on the CARLA simulator and achieve state-of-the-art performance with less data in closed-loop benchmarks. Source codes are available at https://anonymous.4open.science/r/M2DA-4772.

Subjects: Computer Vision and Pattern Recognition , Artificial Intelligence , Robotics

Publish: 2024-03-19 08:54:52 UTC

2403.12552

#1 M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving [PDF9] [Copy] [Kimi10] [REL]

#1 M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving [PDF⁹] [Copy] [Kimi¹⁰] [REL]