The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent

#1 The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent [PDF] [Copy] [Kimi²] [REL]

Authors: Yatin Dandi, Luca Pesce, Lenka Zdeborová, Florent Krzakala

Understanding the advantages of deep neural networks trained by gradient descent (GD) compared to shallow models remains an open theoretical challenge. In this paper, we introduce a class of target functions (single and multi-index Gaussian hierarchical targets) that incorporate a hierarchy of latent subspace dimensionalities. This framework enables us to analytically study the learning dynamics and generalization performance of deep networks compared to shallow ones in the high-dimensional limit. Specifically, our main theorem shows that feature learning with GD successively reduces the effective dimensionality, transforming a high-dimensional problem into a sequence of lower-dimensional ones. This enables learning the target function with drastically less samples than with shallow networks. While the results are proven in a controlled training setting, we also discuss more common training procedures and argue that they learn through the same mechanisms.

Subjects: Machine Learning , Machine Learning

Publish: 2025-02-19 18:58:28 UTC

2502.13961

#1 The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent [PDF] [Copy] [Kimi2] [REL]

#1 The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent [PDF] [Copy] [Kimi²] [REL]