Global curvature for second-order optimization of neural networks

#1 Global curvature for second-order optimization of neural networks [PDF²] [Copy] [Kimi] [REL]

Second-order optimization methods, which leverage the local curvature of the loss function, have the potential to dramatically accelerate the training of machine learning models. However, these methods are often hindered by the computational burden of constructing and inverting large curvature matrices with $\mathcal{O}(p^2)$ elements, where $p$ is the number of parameters. In this work, we present a theory that predicts the \emph{exact} structure of the global curvature by leveraging the intrinsic symmetries of neural networks, such as invariance under parameter permutations. For Multi-Layer Perceptrons (MLPs), our approach reveals that the global curvature can be expressed in terms of $\mathcal{O}(d^2 + L^2)$ independent factors, where $d$ is the number of input/output dimensions and $L$ is the number of layers, significantly reducing the computational burden compared to the $\mathcal{O}(p^2)$ elements of the full matrix. These factors can be estimated efficiently, enabling precise curvature computations.To evaluate the practical implications of our framework, we apply second-order optimization to synthetic data, achieving markedly faster convergence compared to traditional optimization methods.Our findings pave the way for a better understanding of the loss landscape of neural networks, and for designing more efficient training methodologies in deep learning.Code: \href{https://github.com/mtkresearch/symo_notebooks}{github.com/mtkresearch/symo\_notebooks}

Subject: ICML.2025 - Poster

f21sRSRb1E@OpenReview

#1 Global curvature for second-order optimization of neural networks [PDF2] [Copy] [Kimi] [REL]

#1 Global curvature for second-order optimization of neural networks [PDF²] [Copy] [Kimi] [REL]