Hybrid Spiking Vision Transformer for Object Detection with Event Cameras

#1 Hybrid Spiking Vision Transformer for Object Detection with Event Cameras [PDF¹] [Copy] [Kimi] [REL]

Authors: Qi Xu, Jie Deng, Jiangrong Shen, Biwu Chen, Huajin Tang, Gang Pan

Event-based object detection has attracted increasing attention for its high temporal resolution, wide dynamic range, and asynchronous address-event representation. Leveraging these advantages, spiking neural networks (SNNs) have emerged as a promising approach, offering low energy consumption and rich spatiotemporal dynamics. To further enhance the performance of event-based object detection, this study proposes a novel hybrid spike vision Transformer (HsVT) model. The HsVT model integrates a spatial feature extraction module to capture local and global features, and a temporal feature extraction module to model time dependencies and long-term patterns in event sequences. This combination enables HsVT to capture spatiotemporal features, improving its capability in handling complex event-based object detection tasks. To support research in this area, we developed the Fall Detection dataset as a benchmark for event-based object detection tasks. The Fall DVS detection dataset protects facial privacy and reduces memory usage thanks to its event-based representation. Experimental results demonstrate that HsVT outperforms existing SNN methods and achieves competitive performance compared to ANN-based models, with fewer parameters and lower energy consumption.

Subject: ICML.2025 - Poster

WZKcJZWG3P@OpenReview

#1 Hybrid Spiking Vision Transformer for Object Detection with Event Cameras [PDF1] [Copy] [Kimi] [REL]

#1 Hybrid Spiking Vision Transformer for Object Detection with Event Cameras [PDF¹] [Copy] [Kimi] [REL]