Progressive LoRA for Multimodal Continual Instruction Tuning

#1 Progressive LoRA for Multimodal Continual Instruction Tuning [PDF²] [Copy] [Kimi²] [REL]

Authors: Yahan Yu, Duzhen Zhang, Yong Ren, Xuanle Zhao, Xiuyi Chen, Chenhui Chu

Multimodal Continual Instruction Tuning (MCIT) empowers Multimodal Large Language Models (MLLMs) to adapt to ever-evolving requirements without continuous costly retraining. However, MCIT faces challenges in mitigating Catastrophic Forgetting (CF) and enhancing Knowledge Transfer (KT). Existing works combine Mixture-of-Expert (MoE) and LoRA to address these. However, using a fixed number of shared LoRA blocks across tasks can lead to the overwriting of acquired knowledge, making MLLMs harder to handle CF and KT. Therefore, we propose the **Prog**ressive **LoRA** framework (ProgLoRA), which contains a progressive LoRA pool and trains a new LoRA block for each incremental task to reduce knowledge interference. Specifically, ProgLoRA has two key mechanisms: task-aware allocation for effectively leveraging acquired knowledge at current task and task recall for realigning the model with learned tasks. Additionally, considering different application scenarios, we design a static ProgLoRA for the more idealized basic setting and a dynamic ProgLoRA for the more realistic challenging setting. Experiments on the latest MCIT benchmark demonstrate that ProgLoRA outperforms existing approaches.

Subject: ACL.2025 - Findings

2025.findings-acl.143@ACL

#1 Progressive LoRA for Multimodal Continual Instruction Tuning [PDF2] [Copy] [Kimi2] [REL]

#1 Progressive LoRA for Multimodal Continual Instruction Tuning [PDF²] [Copy] [Kimi²] [REL]