fqpbXJ2QtC@OpenReview

Total: 1

#1 Revitalizing SVD for Global Covariance Pooling: Halley’s Method to Overcome Over-Flattening [PDF] [Copy] [Kimi] [REL]

Authors: Jiawei Gu, Ziyue Qiao, Xinming Li, Zechao Li

Global Covariance Pooling (GCP) has garnered increasing attention in visual recognition tasks, where second-order statistics frequently yield stronger representations than first-order approaches. However, two main streams of GCP---Newton--Schulz-based iSQRT-COV and exact or near-exact SVD methods---struggle at opposite ends of the training spectrum. While iSQRT-COV stabilizes early learning by avoiding large gradient explosions, it over-compresses significant eigenvalues in later stages, causing an \emph{over-flattening} phenomenon that stalls final accuracy. In contrast, SVD-based methods excel at preserving the high-eigenvalue structure essential for deep networks but suffer from sensitivity to small eigenvalue gaps early on. We propose \textbf{Halley-SVD}, a high-order iterative method that unites the smooth gradient advantages of iSQRT-COV with the late-stage fidelity of SVD. Grounded in Halley's iteration, our approach obviates explicit divisions by $(\lambda_i - \lambda_j)$ and forgoes threshold- or polynomial-based heuristics. As a result, it prevents both early gradient explosions and the excessive compression of large eigenvalues. Extensive experiments on CNNs and transformer architectures show that Halley-SVD consistently and robustly outperforms iSQRT-COV at large model scales and batch sizes, achieving higher overall accuracy without mid-training switches or custom truncations. This work provides a new solution to the long-standing dichotomy in GCP, illustrating how high-order methods can balance robustness and spectral precision to fully harness the representational power of modern deep networks.

Subject: NeurIPS.2025 - Poster