Total: 1
Existing multimodal sentiment analysis (MSA) methods have achieved significant success, leveraging cross-modal large-scale models (LLMs) and extensive pre-training data. However, these methods struggle to handle MSA tasks in low-resource languages. While multilingual LLMs enable cross-lingual transfer, they are limited to textual data and cannot address multimodal scenarios. To achieve MSA in low-resource languages, we propose a novel transfer learning framework named Language Family Disentanglement and Rethinking Transfer (LFD-RT). During pre-training, we establish cross-lingual and cross-modal alignments, followed by a language family disentanglement module that enhances the sharing of language universals within families while reducing noise from cross-family alignments. We propose a rethinking strategy for unsupervised fine-tuning that adapts the pre-trained model to MSA tasks in low-resource languages. Experimental results demonstrate the superiority of our method and its strong language-transfer capability on target low-resource languages. We commit to making our code and data publicly available, and the access link will be provided here.