Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner

#1 Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner [PDF²] [Copy] [Kimi²] [REL]

Authors: Runa Eschenhagen, Aaron Defazio, Tsung-Hsien Lee, Richard E. Turner, Hao-Jun Michael Shi

The recent success of Shampoo in the AlgoPerf contest has sparked renewed interest in Kronecker-factorization-based optimization algorithms for training neural networks. Despite its success, Shampoo relies heavily on several heuristics such as learning rate grafting and stale preconditioning to achieve performance at-scale. These heuristics increase algorithmic complexity, necessitate further hyperparameter tuning, and lack theoretical justification. This paper investigates these heuristics from the angle of Frobenius norm approximation to full-matrix Adam and decouples the preconditioner's eigenvalues and eigenbasis updates. We show that grafting from Adam mitigates the staleness and mis-scaling of the preconditioner's *eigenvalues* and how correcting the eigenvalues directly eliminates the need for learning rate grafting. To manage the error induced by infrequent *eigenbasis* computations, we propose an adaptive criterion for determining the eigenbasis computation frequency motivated by terminating a warm-started QR algorithm. This criterion decouples the update frequency of different preconditioner matrices and enables us to investigate the impact of approximation error on convergence. These practical techniques offer a principled angle towards removing Shampoo's heuristics and developing improved Kronecker-factorization-based training algorithms.

Subject: NeurIPS.2025 - Spotlight

kePsKwxvaV@OpenReview

#1 Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner [PDF2] [Copy] [Kimi2] [REL]

#1 Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner [PDF²] [Copy] [Kimi²] [REL]