SAMA: Semantic Anchor-aligned Augmentation for Unified Low-Resource Multimodal Information Extraction

#1 SAMA: Semantic Anchor-aligned Augmentation for Unified Low-Resource Multimodal Information Extraction [PDF²] [Copy] [Kimi] [REL]

Authors: Quanjiang Guo, Chong Mu, Jiazhou Pan, Ming Jia, Ling Tian, Hui Gao, Zhao Kang

Multimodal Information Extraction (MIE)-covering tasks such as Multimodal Named Entity Recognition (MNER), Relation Extraction (MRE), and Event Extraction (MEE)-is essential for understanding multimedia content but remains constrained by severe data scarcity. Although data augmentation is a promising remedy, existing approaches are impeded by coarse cross-modal alignment and fragmented, task-specific designs that fail to exploit shared semantic knowledge. To overcome these limitations, we introduce Semantic Anchor-aligned Multimodal Augmentation (SAMA), a unified framework for generating high-fidelity, task-aware synthetic data. SAMA constructs structured semantic anchors from ground-truth labels to guide a Collaborative Multi-Experts Multimodal Large Language Model (CME-MLLM), which integrates a Universal Adapter for shared semantics with Task-Specific Adapters to produce diverse yet constraint-compliant textual samples. For image synthesis, SAMA employs an Anchor-Preserving Diffusion mechanism that uses anchor-weighted prompts and latent conditioning to maintain critical semantic anchors while diversifying visual contexts. To eliminate the need for manual verification, SAMA further introduces a Dual-Constraint Filtering module that selects synthetic samples based on both cross-modal consistency and anchor fidelity. Extensive experiments across benchmark datasets for MNER, MRE, and MEE demonstrate that SAMA consistently outperforms state-of-the-art augmentation baselines under both fully supervised and low-resource settings, underscoring its versatility, robustness, and effectiveness.

Subjects: Computer Vision and Pattern Recognition , Computation and Language , Multimedia

Publish: 2026-06-17 07:43:33 UTC

2606.18780

#1 SAMA: Semantic Anchor-aligned Augmentation for Unified Low-Resource Multimodal Information Extraction [PDF2] [Copy] [Kimi] [REL]

#1 SAMA: Semantic Anchor-aligned Augmentation for Unified Low-Resource Multimodal Information Extraction [PDF²] [Copy] [Kimi] [REL]