Total: 1
This paper introduces a multi-modal masked autoencoder (MMAE) that jointly denoises and classifies signals by fusing time-domain IQ sequences and constellation diagrams within a cross-attentive transformer. This approach treats noise as a learnable modality to enhance robustness, a dynamic masking curriculum combined with domain regularization training and a hybrid loss function to promote domain-invariant features. Experimentation on the RadioML 2018.01A and RadioML22 datasets demonstrates superior accuracy across different SNR levels while using substantially less labeled data than state-of-the-art approaches.