Total: 1
Infrared and visible image fusion (IVF) endeavors to engineer composite outputs by blending optimal virtues of divergent modalities. This paper reveals, unprecedentedly, the intrinsic "attention properties" of infrared images, which directly arise from their physical characteristics (i.e., heat distribution) and can be linked to attention mechanisms naturally, as observed in the gradient-weighted class activation mapping (Grad-CAM) visualization analysis of image classification models. To incorporate this property into IVF for better fusion, we propose the source infrared cross attention (I-SCA) and further extend it to the visible modality, subsequently introducing the source visible cross attention (V-SCA). The joint use of I-SCA and V-SCA greatly alleviate longstanding issues, such as insufficient and incomplete multimodal feature interaction and fusion, in IVF. Moreover, an auxiliary component for I-SCA and V-SCA, termed CBSM, is employed to boost the channel, map space, and suppress redundancy and misleading information of the source images. Specifically, we directly treat the CBSM-processed raw image as the query, while the intermediate features of another modality are treated as keys and values in I-SCA and V-SCA. Unlike attention mechanisms that divide images into patches or limit computations to local windows, our cross attention modules achieve smoother and more robust IVF through true global modeling across the entire image space with linear complexity. Comparison with current SOTA methods on three popular public datasets confirms its superiority.