Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster

#1 Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster [PDF¹] [Copy] [Kimi] [REL]

Authors: Sharan Vaswani, Reza Babanezhad

Armijo line-search (Armijo-LS) is a standard method to set the step-size for gradient descent (GD). For smooth functions, Armijo-LS alleviates the need to know the global smoothness constant $L$ and adapts to the ``local'' smoothness, enabling GD to converge faster. Existing theoretical analyses show that GD with Armijo-LS ($\texttt{GD-LS}$) can result in constant factor improvements over GD with a $1/L$ step-size (denoted as $\texttt{GD(1/L)}$). We strengthen these results and show that if the objective function satisfies a certain non-uniform smoothness condition, $\texttt{GD-LS}$ can result in a faster convergence rate than $\texttt{GD(1/L)}$. In particular, we prove that for convex objectives corresponding to logistic regression and multi-class classification, $\texttt{GD-LS}$ can converge to the optimum at a linear rate, and hence improves over the sublinear convergence of $\texttt{GD(1/L)}$. Furthermore, for non-convex objectives satisfying gradient domination (e.g., those corresponding to the softmax policy gradient in RL or generalized linear models with a logistic link function), $\texttt{GD-LS}$ can match the fast convergence of algorithms tailored for these specific settings. Finally, we prove that under the interpolation assumption, for convex losses, stochastic GD with a stochastic line-search can match the fast convergence of $\texttt{GD-LS}$.

Subject: ICML.2025 - Poster

LKQIS65fgd@OpenReview

#1 Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster [PDF1] [Copy] [Kimi] [REL]

#1 Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster [PDF¹] [Copy] [Kimi] [REL]