Problem-dependent convergence bounds for randomized linear gradient compression

#1 Problem-dependent convergence bounds for randomized linear gradient compression [PDF¹] [Copy] [Kimi] [REL]

Authors: Thomas Flynn, Patrick Johnstone, Shinjae Yoo

In distributed optimization, the communication of model updates can be a performance bottleneck. Consequently, gradient compression has been proposed as a means of increasing optimization throughput. In general, due to information loss, compression introduces a penalty on the number of iterations needed to reach a solution. In this work, we investigate how the iteration penalty depends on the interaction between compression and problem structure, in the context of non-convex stochastic optimization. We focus on linear schemes, where compression and decompression can be modeled as multiplication with a random matrix. We consider several distributions of matrices, among them Haar-distributed orthogonal matrices and matrices with random Gaussian entries. We find that the impact of compression on convergence can be quantified in terms of a smoothness matrix associated with the objective function, using a norm defined by the compression scheme. The analysis reveals that in certain cases, compression performance is related to low-rank structure or other spectral properties of the problem and our bounds predict that the penalty introduced by compression is significantly reduced compared to worst-case bounds that only consider the compression level, ignoring problem data. We verify the theoretical findings experimentally, including fine-tuning an image classification model.

Subjects: Optimization and Control , Machine Learning

Publish: 2024-11-19 22:26:42 UTC

2411.12898

#1 Problem-dependent convergence bounds for randomized linear gradient compression [PDF1] [Copy] [Kimi] [REL]

#1 Problem-dependent convergence bounds for randomized linear gradient compression [PDF¹] [Copy] [Kimi] [REL]