2511.02564

Total: 1

#1 Seeing Across Time and Views: Multi-Temporal Cross-View Learning for Robust Video Person Re-Identification [PDF1] [Copy] [Kimi] [REL]

Authors: Md Rashidunnabi, Kailash A. Hambarde, Vasco Lopes, Joao C. Neves, Hugo Proenca

Video-based person re-identification (ReID) in cross-view domains (for example, aerial-ground surveillance) remains an open problem because of extreme viewpoint shifts, scale disparities, and temporal inconsistencies. To address these challenges, we propose MTF-CVReID, a parameter-efficient framework that introduces seven complementary modules over a ViT-B/16 backbone. Specifically, we include: (1) Cross-Stream Feature Normalization (CSFN) to correct camera and view biases; (2) Multi-Resolution Feature Harmonization (MRFH) for scale stabilization across altitudes; (3) Identity-Aware Memory Module (IAMM) to reinforce persistent identity traits; (4) Temporal Dynamics Modeling (TDM) for motion-aware short-term temporal encoding; (5) Inter-View Feature Alignment (IVFA) for perspective-invariant representation alignment; (6) Hierarchical Temporal Pattern Learning (HTPL) to capture multi-scale temporal regularities; and (7) Multi-View Identity Consistency Learning (MVICL) that enforces cross-view identity coherence using a contrastive learning paradigm. Despite adding only about 2 million parameters and 0.7 GFLOPs over the baseline, MTF-CVReID maintains real-time efficiency (189 FPS) and achieves state-of-the-art performance on the AG-VPReID benchmark across all altitude levels, with strong cross-dataset generalization to G2A-VReID and MARS datasets. These results show that carefully designed adapter-based modules can substantially enhance cross-view robustness and temporal consistency without compromising computational efficiency. The source code is available at https://github.com/MdRashidunnabi/MTF-CVReID

Subject: Computer Vision and Pattern Recognition

Publish: 2025-11-04 13:37:59 UTC