SGD Convergence under Stepsize Shrinkage in Low-Precision Training

#1 SGD Convergence under Stepsize Shrinkage in Low-Precision Training [PDF] [Copy] [Kimi³] [REL]

Low-precision training has become crucial for reducing the computational and memory costs of large-scale deep learning. However, quantizing gradients introduces magnitude shrinkage, which can change how stochastic gradient descent (SGD) converges. In this study, we explore SGD convergence under a gradient shrinkage model, where each stochastic gradient is scaled by a factor \( q_k \in (0,1] \). We show that this shrinkage affect the usual stepsize \( \mu_k \) with an effective stepsize \( \mu_k q_k \), slowing convergence when \( q_{\min} < 1 \). With typical smoothness and bounded-variance assumptions, we prove that low-precision SGD still converges, but at a slower pace set by \( q_{\min} \), and with a higher steady error level due to quantization effects. We analyze theoretically how lower numerical precision slows training by treating it as gradient shrinkage within the standard SGD convergence setup.

Subjects: Machine Learning , Artificial Intelligence , Information Theory , Numerical Analysis

Publish: 2025-08-10 02:25:48 UTC

2508.07142

#1 SGD Convergence under Stepsize Shrinkage in Low-Precision Training [PDF] [Copy] [Kimi3] [REL]

#1 SGD Convergence under Stepsize Shrinkage in Low-Precision Training [PDF] [Copy] [Kimi³] [REL]