Fm0nDMKBwC@OpenReview

Total: 1

#1 LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits [PDF3] [Copy] [Kimi1] [REL]

Authors: Zikai Zhou, Qizheng Zhang, Hermann Kumbong, Kunle Olukotun

Fine-tuning large language models (LLMs) is increasingly costly as models scale to hundreds of billions of parameters, and even parameter-efficient fine-tuning (PEFT) methods like LoRA remain resource-intensive.We introduce LowRA, the first framework to enable LoRA fine-tuning below 2 bits per parameter with minimal performance loss.LowRA optimizes fine-grained quantization—mapping, threshold selection, and precision assignment—while leveraging efficient CUDA kernels for scalable deployment.Extensive evaluations across 4 LLMs and 4 datasets show that LowRA achieves a superior performance–precision trade-off above 2 bits and remains accurate down to 1.15 bits, reducing memory usage by up to 50\%. Our results highlight the potential of ultra-low-bit LoRA fine-tuning for resource-constrained environments.

Subject: ICML.2025 - Poster