Semi-Supervised Masked Autoencoders: Unlocking Vision Transformer Potential with Limited Data

#1 Semi-Supervised Masked Autoencoders: Unlocking Vision Transformer Potential with Limited Data [PDF⁵] [Copy] [Kimi²] [REL]

Authors: Atik Faysal, Mohammad Rostami, Reihaneh Gh. Roshan, Nikhil Muralidhar, Huaxia Wang

We address the challenge of training Vision Transformers (ViTs) when labeled data is scarce but unlabeled data is abundant. We propose Semi-Supervised Masked Autoencoder (SSMAE), a framework that jointly optimizes masked image reconstruction and classification using both unlabeled and labeled samples with dynamically selected pseudo-labels. SSMAE introduces a validation-driven gating mechanism that activates pseudo-labeling only after the model achieves reliable, high-confidence predictions that are consistent across both weakly and strongly augmented views of the same image, reducing confirmation bias. On CIFAR-10 and CIFAR-100, SSMAE consistently outperforms supervised ViT and fine-tuned MAE, with the largest gains in low-label regimes (+9.24% over ViT on CIFAR-10 with 10% labels). Our results demonstrate that when pseudo-labels are introduced is as important as how they are generated for data-efficient transformer training. Codes are available at https://github.com/atik666/ssmae.

Subjects: Computer Vision and Pattern Recognition , Artificial Intelligence , Machine Learning

Publish: 2026-01-27 21:32:22 UTC

2601.20072

#1 Semi-Supervised Masked Autoencoders: Unlocking Vision Transformer Potential with Limited Data [PDF5] [Copy] [Kimi2] [REL]

#1 Semi-Supervised Masked Autoencoders: Unlocking Vision Transformer Potential with Limited Data [PDF⁵] [Copy] [Kimi²] [REL]