Total: 1
Mamba, a selective state-space model, has recently seen widespread application across various visual tasks due to its exceptional ability to capture long-range dependencies. While promising results have been demonstrated in image classification, its potential in multi-label image classification remains underexplored. To bridge this gap, we propose a novel Mamba-based decoder, which utilizes the intrinsic attention of Mamba to aggregate visual information from image features into label embeddings, yielding label-specific visual representations. Building upon this, a MambaML framework is developed for multi-label image classification, which models the self-correlations of image features and label embeddings with bi-directional Mamba, as well as their cross-correlations with the Mamba-based decoder, allowing visual spatial relationships, label semantic dependencies, and cross-modal associations to be explored in a unified system. In this way, robust label-specific visual representations are acquired, facilitating the training of binary classifiers towards accurate label recognition. Experiments on public benchmarks suggest that our MambaML achieves performance comparable to state-of-the-art methods in multi-label image classification, while requiring fewer parameters and computational overhead.