The Adaptive Complexity of Finding a Stationary Point | Cool Papers

#1 The Adaptive Complexity of Finding a Stationary Point [PDF] [Copy] [Kimi] [REL]

Authors: Huanjian Zhou, Andi Han, Akiko Takeda, Masashi Sugiyama

In large-scale applications, such as machine learning, it is desirable to design non-convex optimization algorithms with a high degree of parallelization. In this work, we study the adaptive complexity of finding a stationary point, which is the minimal number of sequential rounds required to achieve stationarity given polynomially many queries executed in parallel at each round. For the high-dimensional case, i.e., $d = \widetilde{\Omega}(\varepsilon^{-(2 + 2p)/p})$ , we show that for any (potentially randomized) algorithm, there exists a function with Lipschitz $p$ -th order derivatives such that the algorithm requires at least $\varepsilon^{-(p+1)/p}$ iterations to find an $\varepsilon$ -stationary point. Our lower bounds are tight and show that even with $\mathrm{poly}(d)$ queries per iteration, no algorithm has better convergence rate than those achievable with one-query-per-round algorithms. In other words, gradient descent, the cubic-regularized Newton's method, and the $p$ -th order adaptive regularization method are adaptively optimal. Our proof relies upon novel analysis with the characterization of the output for the hardness potentials based on a chain-like structure with random partition. For the constant-dimensional case, i.e., $d = \Theta(1)$ , we propose an algorithm that bridges grid search and gradient flow trapping, finding an approximate stationary point in constant iterations. Its asymptotic tightness is verified by a new lower bound on the required queries per iteration. We show there exists a smooth function such that any algorithm running with $\Theta(\log (1/\varepsilon))$ rounds requires at least $\widetilde{\Omega}((1/\varepsilon)^{(d-1)/2})$ queries per round. This lower bound is tight up to a logarithmic factor, and implies that the gradient flow trapping is adaptively optimal.

Subjects: Optimization and Control , Computational Complexity , Distributed, Parallel, and Cluster Computing

Publish: 2025-05-14 00:54:49 UTC

2505.09045

#1 The Adaptive Complexity of Finding a Stationary Point [PDF] [Copy] [Kimi] [REL]