OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models

#1 OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models [PDF³] [Copy] [Kimi⁵] [REL]

Authors: Xiaoyu Xu, Minxin Du, Qingqing Ye, Haibo Hu

Large language models (LLMs) trained over extensive corpora risk memorizing sensitive, copyrighted, or toxic content. To address this, we propose OBLIVIATE, a robust unlearning framework that removes targeted data while preserving model utility. The framework follows a structured process: extracting target tokens, building retain sets, and fine-tuning with a tailored loss function comprising three components -- masking, distillation, and world fact. Using low-rank adapters (LoRA), it ensures efficiency without compromising unlearning quality. We conduct experiments on multiple datasets, including the Harry Potter series, WMDP, and TOFU, using a comprehensive suite of metrics: forget quality (new document-level memorization score), model utility, and fluency. Results demonstrate its effectiveness in resisting membership inference attacks, minimizing the impact on retained data, and maintaining robustness across diverse scenarios.

Subjects: Computation and Language , Artificial Intelligence , Cryptography and Security , Machine Learning

Publish: 2025-05-07 13:51:42 UTC

2505.04416

#1 OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models [PDF3] [Copy] [Kimi5] [REL]

#1 OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models [PDF³] [Copy] [Kimi⁵] [REL]