Total: 1
Extending the context length of Language Models (LMs) by improving Rotary Position Embedding (RoPE) has become a trend.While prior works mainly address RoPE's limitations within attention, this paper uncovers the adverse effects on length generalization from nearly all parts of LMs.Using *Discrete Signal Processing* theory, we show that RoPE enables periodic attention by implicitly achieving *Non-Uniform Discrete Fourier Transform*.However, this periodicity is undermined by the spectrum damage caused by: 1) linear layers and activation functions outside of attention; 2) insufficiently trained frequency components brought by time-domain truncation. Building on our observations, we propose ***Fourier Position Embedding (FoPE)***, which enhances attention's frequency-domain properties to improve both its periodic extension and length generalization. FoPE constructs *Fourier Series* and zero-outs the destructive frequency components, increasing model robustness against the spectrum damage.Experiments across various model scales and benchmarks show that, within varying context windows, FoPE maintains a more stable performance compared to other baselines.Several analyses and ablations bring further support to our method and theoretical modeling.