Ykg7dr6c1Y@OpenReview

Total: 1

#1 Adaptive Gradient Masking for Balancing ID and MLLM-based Representations in Recommendation [PDF] [Copy] [Kimi] [REL]

Authors: Yidong Wu, Siyuan Chen, Binrui Wu, Fan Li, Jiechao Gao

In large-scale recommendation systems, multimodal (MM) content is increasingly introduced to enhance the generalization of ID features. The rise of Multimodal Large Language Models (MLLMs) enables the construction of unified user and item representations. However, the semantic distribution gap between MM and ID representations leads to \textit{convergence inconsistency} during joint training: the ID branch converges quickly, while the MM branch requires more epochs, thus limiting overall performance. To address this, we propose a two-stage framework including MM representation learning and joint training optimization. First, we fine-tune the MLLM to generate unified user and item representations, and introduce collaborative signals by post-aligning user ID representations to alleviate semantic differences. Then, we propose an Adaptive Gradient Masking (AGM) training strategy to dynamically regulate parameter updates between ID and MLLM branches. AGM estimates the contribution of each representation with mutual information, and applies non-uniform gradient masking at the sub-network level to balance optimization. We provide theoretical analysis of AGM's effectiveness and further introduce an unbiased variant, AGM*, to enhance training stability. Experiments on offline and online A/B tests validate the effectiveness of our approach in mitigating convergence inconsistency and improving performance.

Subject: NeurIPS.2025 - Poster