OW-VAP: Visual Attribute Parsing for Open World Object Detection

#1 OW-VAP: Visual Attribute Parsing for Open World Object Detection [PDF] [Copy] [Kimi¹] [REL]

Authors: Xing Xi, Xing Fu, Weiqiang Wang, Ronghua Luo

Open World Object Detection (OWOD) requires the detector to continuously identify and learn new categories. Existing methods rely on the large language model (LLM) to describe the visual attributes of known categories and use these attributes to mark potential objects. The performance of such methods is influenced by the accuracy of LLM descriptions, and selecting appropriate attributes during incremental learning remains a challenge. In this paper, we propose a novel OWOD framework, termed OW-VAP, which operates independently of LLM and requires only minimal object descriptions to detect unknown objects. Specifically, we propose a Visual Attribute Parser (VAP) that parses the attributes of visual regions and assesses object potential based on the similarity between these attributes and the object descriptions. To enable the VAP to recognize objects in unlabeled areas, we exploit potential objects within background regions. Finally, we propose Probabilistic Soft Label Assignment (PSLA) to prevent optimization conflicts from misidentifying background as foreground. Comparative results on the OWOD benchmark demonstrate that our approach surpasses existing state-of-the-art methods with a +13 improvement in U-Recall and a +8 increase in U-AP for unknown detection capabilities. Furthermore, OW-VAP approaches the unknown recall upper limit of the detector.

Subject: ICML.2025 - Poster

OXIIRxwJwx@OpenReview

#1 OW-VAP: Visual Attribute Parsing for Open World Object Detection [PDF] [Copy] [Kimi1] [REL]

#1 OW-VAP: Visual Attribute Parsing for Open World Object Detection [PDF] [Copy] [Kimi¹] [REL]