Total: 1
We consider the empirical distribution of the embeddings of a $k$-layer polynomial GNN on a semi-supervised node classification task and prove a central limit theorem for them. Assuming a community based model for the underlying graph, with growing average degree $\nu_n\to\infty$, we show that the empirical distribution of the centered features, when scaled by $\nu_{n}^{k-1/2}$ converge in 1-Wasserstein distance to a centered stable mixture of multivariate normal distributions. In addition, the joint empirical distribution of uncentered features and labels when normalized by $\nu_n^k$ approach that of mixture of multivariate normal distributions, with stable means and covariance matrices vanishing as $\nu_n^{-1}$. We explicitly identify the asymptotic means and covariances, showing that the mixture collapses towards a 1-D version as $k$ is increased. Our results provides a precise and nuanced lens on how oversmoothing presents itself in the large graph limit, in the sparse regime. In particular, we show that training with cross-entropy on these embeddings is asymptotically equivalent to training on these nearly collapsed Gaussian mixtures.