rIwUDnRGky@OpenReview

Total: 1

#1 Language‑Bias‑Resilient Visual Question Answering via Adaptive Multi‑Margin Collaborative Debiasing [PDF] [Copy] [Kimi] [REL]

Authors: Huanjia Zhu, Shuyuan Zheng, Yishu Liu, Sudong Cai, Bingzhi Chen

Language bias in Visual Question Answering (VQA) arises when models exploit spurious statistical correlations between question templates and answers, particularly in out-of-distribution scenarios, thereby neglecting essential visual cues and compromising genuine multimodal reasoning. Despite numerous efforts to enhance the robustness of VQA models, a principled understanding of how such bias originates and influences model behavior remains underdeveloped. In this paper, we address this gap through a comprehensive empirical and theoretical analysis, revealing that modality-specific gradient imbalances, which originate from the inherent heterogeneity of multimodal data, lead to skewed feature fusion and biased classifier weights. To alleviate these issues, we propose a novel Multi-Margin Collaborative Debiasing (MMCD) framework that adaptively integrates frequency-, confidence-, and difficulty-aware angular margins with a dynamic difficulty-aware contrastive learning mechanism, to dynamically reshape decision boundaries. Extensive experiments across multiple challenging VQA benchmarks confirm the consistent superiority of our proposed MMCD over state-of-the-art baselines in combating language bias.

Subject: NeurIPS.2025 - Poster