2602.07534

Total: 1

#1 Fine-Grained Cat Breed Recognition with Global Context Vision Transformer [PDF1] [Copy] [Kimi] [REL]

Authors: Mowmita Parvin Hera, Md. Shahriar Mahmud Kallol, Shohanur Rahman Nirob, Md. Badsha Bulbul, Jubayer Ahmed, M. Zhourul Islam, Hazrat Ali, Mohammmad Farhad Bulbul

Accurate identification of cat breeds from images is a challenging task due to subtle differences in fur patterns, facial structure, and color. In this paper, we present a deep learning-based approach for classifying cat breeds using a subset of the Oxford-IIIT Pet Dataset, which contains high-resolution images of various domestic breeds. We employed the Global Context Vision Transformer (GCViT) architecture-tiny for cat breed recognition. To improve model generalization, we used extensive data augmentation, including rotation, horizontal flipping, and brightness adjustment. Experimental results show that the GCViT-Tiny model achieved a test accuracy of 92.00% and validation accuracy of 94.54%. These findings highlight the effectiveness of transformer-based architectures for fine-grained image classification tasks. Potential applications include veterinary diagnostics, animal shelter management, and mobile-based breed recognition systems. We also provide a hugging face demo at https://huggingface.co/spaces/bfarhad/cat-breed-classifier.

Subjects: Computer Vision and Pattern Recognition , Artificial Intelligence , Image and Video Processing

Publish: 2026-02-07 13:13:47 UTC