Wu_Generating_Multimodal_Driving_Scenes_via_Next-Scene_Prediction@CVPR2025@CVF

Total: 1

#1 Generating Multimodal Driving Scenes via Next-Scene Prediction [PDF] [Copy] [Kimi] [REL]

Authors: Yanhao Wu, Haoyang Zhang, Tianwei Lin, Lichao Huang, Shujie Luo, Rui Wu, Congpei Qiu, Wei Ke, Tong Zhang

Generative models in Autonomous Driving (AD) enable diverse scenario creation, yet existing methods fall short by only capturing a limited range of modalities, restricting the capability of generating controllable scenarios for comprehensive evaluation of AD systems. In this paper, we introduce a multimodal generation framework that incorporates four major data modalities, including a novel addition of map modality. With tokenized modalities, our scene sequence generation framework autoregressively predicts each scene while managing computational demands through a two-stage approach. The Temporal AutoRegressive (TAR) component captures inter-frame dynamics for each modality while the Ordered AutoRegressive (OAR) component aligns modalities within each scene by sequentially predicting tokens in a fixed order. To maintain coherence between map and ego-action modalities, we introduce the Action-aware Map Alignment (AMA) module, which applies a transformation based on the ego-action to maintain coherence between these modalities. Our framework effectively generates complex, realistic driving scenarios over extended sequences, ensuring multimodal consistency and offering fine-grained control over scenario elements. Visualization of generated multimodal driving scenes can be found in supplementary materials.

Subject: CVPR.2025 - Poster