Combining Transformers and CNNs for Efficient Object Detection in High-Resolution Satellite Imagery

2507.11040

Total: 1

#1 Combining Transformers and CNNs for Efficient Object Detection in High-Resolution Satellite Imagery [PDF¹] [Copy] [Kimi] [REL]

Authors: Nicolas Drapier, Aladine Chetouani, Aurélien Chateigner

We present GLOD, a transformer-first architecture for object detection in high-resolution satellite imagery. GLOD replaces CNN backbones with a Swin Transformer for end-to-end feature extraction, combined with novel UpConvMixer blocks for robust upsampling and Fusion Blocks for multi-scale feature integration. Our approach achieves 32.95\% on xView, outperforming SOTA methods by 11.46\%. Key innovations include asymmetric fusion with CBAM attention and a multi-path head design capturing objects across scales. The architecture is optimized for satellite imagery challenges, leveraging spatial priors while maintaining computational efficiency.

Subject: Computer Vision and Pattern Recognition

Publish: 2025-07-15 07:10:34 UTC

2507.11040

#1 Combining Transformers and CNNs for Efficient Object Detection in High-Resolution Satellite Imagery [PDF1] [Copy] [Kimi] [REL]

#1 Combining Transformers and CNNs for Efficient Object Detection in High-Resolution Satellite Imagery [PDF¹] [Copy] [Kimi] [REL]