Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent

#1 Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent [PDF¹] [Copy] [Kimi] [REL]

Authors: Ya-Chi Chu, Wenzhi Gao, Yinyu Ye, Madeleine Udell

This paper investigates the convergence properties of the hypergradient descent method ($\texttt{HDM}$), a 25-year-old heuristic originally proposed for adaptive stepsize selection in stochastic first-order methods. We provide the first rigorous convergence analysis of $\texttt{HDM}$ using the online learning framework and apply this analysis to develop a new state-of-the-art adaptive gradient methods with empirical and theoretical support. Notably, $\texttt{HDM}$ automatically identifies the optimal stepsize for the local optimization landscape and achieves local superlinear convergence. Our analysis explains the instability of $\texttt{HDM}$ reported in the literature and proposes efficient strategies to address it. We also develop two $\texttt{HDM}$ variants with heavy-ball and Nesterov momentum. Experiments on deterministic convex problems show $\texttt{HDM}$ with heavy-ball momentum ($\texttt{HDM-HB}$) exhibits robust performance and significantly outperforms other adaptive first-order methods. Moreover, $\texttt{HDM-HB}$ often matches the performance of $\texttt{L-BFGS}$, an efficient and practical quasi-Newton method, using less memory and cheaper iterations.

Subject: ICML.2025 - Poster

NkVCB1Cpgl@OpenReview

#1 Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent [PDF1] [Copy] [Kimi] [REL]

#1 Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent [PDF¹] [Copy] [Kimi] [REL]