Generalization Bounds for Rank-sparse Neural Networks

#1 Generalization Bounds for Rank-sparse Neural Networks [PDF] [Copy] [Kimi] [REL]

Authors: Antoine Ledent, Rodrigo Alves, Yunwen Lei

It has been recently observed in much of the literature that neural networks exhibit a bottleneck rank property: for larger depths, the activation and weights of neural networks trained with gradient-based methods tend to be of approximately low rank. In fact, the rank of the activations of each layer converges to a fixed value referred to as the ``bottleneck rank", which is the minimum rank required to represent the training data. This perspective is in line with the observation that regularizing linear networks (without activations) with weight decay is equivalent to minimizing the Schatten $p$ quasi norm of the neural network. In this paper we investigate the implications of this phenomenon for generalization. More specifically, we prove generalization bounds for neural networks which exploit the approximate low rank structure of the weight matrices if present. The final results rely on the Schatten $p$ quasi norms of the weight matrices: for small p, the bounds exhibit a sample complexity $ \widetilde{O}(WrL^2)$ where $W$ and $L$ are the width and depth of the neural network respectively and where $r$ is the rank of the weight matrices. As $p$ increases, the bound behaves more like a norm-based bound instead. The proof techniques involve a careful interpolation between the parametric and norm based regimes. We also demonstrate in experiments that this bound outperforms both classic parameter counting and norm based bounds in the typical overparametrized regime.

Subject: NeurIPS.2025 - Poster

n3M8h9mqDm@OpenReview

#1 Generalization Bounds for Rank-sparse Neural Networks [PDF] [Copy] [Kimi] [REL]