Total: 1
Open-set fine-grained recognition (OSFGR) is the core exploration of building open-world intelligent systems. The challenge lies in the gradual semantic drift during the transition from coarse-grained to fine-grained categories. However, although existing methods leverage hierarchical representations to assist progressive reasoning, they neglect semantic consistency across hierarchies. To address this, we propose a multimodal progressive bidirectional reasoning framework: (1) In forward reasoning, the model progressively refines visual features to capture hierarchical structural representations, while (2) in backward reasoning, variational inference integrates multimodal information to constrain consistency in category-aware latent spaces. This mechanism mitigates semantic drift through bidirectional information flow and cross-hierarchical feature consistency constraints. Extensive experiments on the iNat2021-OSR dataset demonstrate that our proposed method achieves superior performance.