Total: 1
Fine-tuning LLMs is both computationally andmemory-intensive. While parameter-efficient fine-tuning methods, such as QLoRA and DoRA,reduce the number of trainable parameters andlower memory usage, they do not decrease computational cost. In some cases, they may evenslow down fine-tuning. In this paper, we introduceSparseLoRA, a method that accelerates LLM fine-tuning through contextual sparsity. We proposea lightweight, training-free SVD sparsity estimator that dynamically selects a sparse subset ofweights for loss and gradient computation. Also,we systematically analyze and address sensitivityacross layers, tokens, and training steps. Our experimental results show that SparseLoRA reducescomputational cost by up to $2.0\times$ and a measuredspeedup of up to $1.5\times$ while maintaining accuracy across various downstream tasks, includingcommonsense and arithmetic reasoning, code generation, and instruction following.