Provable Benefits of Sinusoidal Activation for Modular Addition

#1 Provable Benefits of Sinusoidal Activation for Modular Addition [PDF] [Copy] [Kimi¹] [REL]

This paper studies the role of activation functions in learning modular addition with two-layer neural networks. We first establish a sharp expressivity gap: sine MLPs admit width-$2$ exact realizations for any fixed length $m$ and, with bias, width-$2$ exact realizations uniformly over all lengths. In contrast, the width of ReLU networks must scale linearly with $m$ to interpolate, and they cannot simultaneously fit two lengths with different residues modulo $p$. We then provide a novel Natarajan-dimension generalization bound for sine networks, yielding nearly optimal sample complexity $\widetilde{\mathcal{O}}(p)$ for ERM over constant-width sine networks. We also derive width-independent, margin-based generalization for sine networks in the overparametrized regime and validate it. Empirically, sine networks generalize consistently better than ReLU networks across regimes and exhibit strong length extrapolation.

Subjects: Machine Learning , Machine Learning

Publish: 2025-11-28 18:37:03 UTC

2511.23443

#1 Provable Benefits of Sinusoidal Activation for Modular Addition [PDF] [Copy] [Kimi1] [REL]

#1 Provable Benefits of Sinusoidal Activation for Modular Addition [PDF] [Copy] [Kimi¹] [REL]