2509.21894

Total: 1

#1 LG-CD: Enhancing Language-Guided Change Detection through SAM2 Adaptation [PDF5] [Copy] [Kimi2] [REL]

Authors: College of Computer Science, Sichuan University, China Yixiao Liu, College of Computer Science, Sichuan University, China Yizhou Yang, School of Computer Science and Technology, Xinjiang University, China Jinwen Li, College of Computer Science, Sichuan University, China Jun Tao, College of Computer Science, Sichuan University, China Ruoyu Li, College of Computer Science, Sichuan University, China Xiangkun Wang, College of Computer Science, Sichuan University, China Min Zhu, College of Computer Science, Sichuan University, China Junlong Cheng

Remote Sensing Change Detection (RSCD) typically identifies changes in land cover or surface conditions by analyzing multi-temporal images. Currently, most deep learning-based methods primarily focus on learning unimodal visual information, while neglecting the rich semantic information provided by multimodal data such as text. To address this limitation, we propose a novel Language-Guided Change Detection model (LG-CD). This model leverages natural language prompts to direct the network's attention to regions of interest, significantly improving the accuracy and robustness of change detection. Specifically, LG-CD utilizes a visual foundational model (SAM2) as a feature extractor to capture multi-scale pyramid features from high-resolution to low-resolution across bi-temporal remote sensing images. Subsequently, multi-layer adapters are employed to fine-tune the model for downstream tasks, ensuring its effectiveness in remote sensing change detection. Additionally, we design a Text Fusion Attention Module (TFAM) to align visual and textual information, enabling the model to focus on target change regions using text prompts. Finally, a Vision-Semantic Fusion Decoder (V-SFD) is implemented, which deeply integrates visual and semantic information through a cross-attention mechanism to produce highly accurate change detection masks. Our experiments on three datasets (LEVIR-CD, WHU-CD, and SYSU-CD) demonstrate that LG-CD consistently outperforms state-of-the-art change detection methods. Furthermore, our approach provides new insights into achieving generalized change detection by leveraging multimodal information.

Subject: Computer Vision and Pattern Recognition

Publish: 2025-09-26 05:30:11 UTC