y5Diyh9XEQ@OpenReview

Total: 1

#1 Asymptotic theory of SGD with a general learning-rate [PDF1] [Copy] [Kimi] [REL]

Authors: Or Goldreich, Ziyang Wei, SOHAM BONNERJEE, Jiaqi Li, Wei Biao Wu

Stochastic gradient descent (SGD) with polynomially decaying step‐sizes has long underpinned theoretical analyses, yielding a broad spectrum of statistically attractive guarantees. Yet in practice, such schedules find rare use due to their prohibitively slow convergence, revealing a persistent gap between theory and empirical performance. In this paper, we introduce a unified framework that quantifies the uncertainty of online SGD under arbitrary learning‐rate choices. In particular, we provide the first comprehensive convergence characterizations for two widely used but theoretically under-examined schemes—cyclical learning rates and linear decay to zero. Our results not only explain the observed behavior of these schedules but also facilitate principled tools for statistical inference and algorithm design. All theoretical findings are corroborated by extensive simulations across diverse settings.

Subject: NeurIPS.2025 - Poster