Leveraging Lightweight Entity Extraction for Scalable Event-Based Image Retrieval

#1 Leveraging Lightweight Entity Extraction for Scalable Event-Based Image Retrieval [PDF¹] [Copy] [Kimi] [REL]

Authors: Dao Sy Duy Minh, Huynh Trung Kiet, Nguyen Lam Phu Quy, Phu-Hoa Pham, Tran Chi Nguyen

Retrieving images from natural language descriptions is a core task at the intersection of computer vision and natural language processing, with wide-ranging applications in search engines, media archiving, and digital content management. However, real-world image-text retrieval remains challenging due to vague or context-dependent queries, linguistic variability, and the need for scalable solutions. In this work, we propose a lightweight two-stage retrieval pipeline that leverages event-centric entity extraction to incorporate temporal and contextual signals from real-world captions. The first stage performs efficient candidate filtering using BM25 based on salient entities, while the second stage applies BEiT-3 models to capture deep multimodal semantics and rerank the results. Evaluated on the OpenEvents v1 benchmark, our method achieves a mean average precision of 0.559, substantially outperforming prior baselines. These results highlight the effectiveness of combining event-guided filtering with long-text vision-language modeling for accurate and efficient retrieval in complex, real-world scenarios. Our code is available at https://github.com/PhamPhuHoa-23/Event-Based-Image-Retrieval

Subjects: Computer Vision and Pattern Recognition , Artificial Intelligence

Publish: 2025-12-24 15:02:33 UTC

2512.21221

#1 Leveraging Lightweight Entity Extraction for Scalable Event-Based Image Retrieval [PDF1] [Copy] [Kimi] [REL]

#1 Leveraging Lightweight Entity Extraction for Scalable Event-Based Image Retrieval [PDF¹] [Copy] [Kimi] [REL]