A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation

#1 A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation [PDF¹] [Copy] [Kimi¹] [REL]

Authors: Yan Li, Tianyi Zhang, Zechuan Li, Caren Han

Transformer-based Large Language Models (LLMs) struggle with inputs exceeding their training context window due to positional out-of-distribution (O.O.D.) issues that disrupt attention. Existing solutions, including fine-tuning and training-free methods, face challenges like inefficiency, redundant interpolation, logit outliers, or loss of local positional information. We propose Greedy Attention Logit Interpolation (GALI), a training-free method that improves length extrapolation by greedily reusing pretrained positional intervals and interpolating attention logits to eliminate outliers. GALI achieves stable and superior performance across a wide range of long-context tasks without requiring input-length-specific tuning. Our analysis further reveals that LLMs interpret positional intervals unevenly and that restricting interpolation to narrower ranges improves performance, even on short-context tasks. GALI represents a step toward more robust and generalizable long-text processing in LLMs.

Subject: EMNLP.2025 - Main

2025.emnlp-main.443@ACL

#1 A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation [PDF1] [Copy] [Kimi1] [REL]

#1 A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation [PDF¹] [Copy] [Kimi¹] [REL]