Tdl89SZItB@OpenReview

Total: 1

#1 Accurate KV Cache Eviction via Anchor Direction Projection for Efficient LLM Inference [PDF1] [Copy] [Kimi1] [REL]

Authors: Zijie Geng, Jie Wang, Ziqi Liu, Feng Ju, Yiming Li, Xing Li, Mingxuan Yuan, Jianye HAO, Defu Lian, Enhong Chen, Feng Wu

Key-Value (KV) cache eviction---which retains the KV pairs of the most important tokens while discarding less important ones---is a critical technique for optimizing both memory usage and inference latency in large language models (LLMs). However, existing approaches often rely on simple heuristics---such as attention weights---to measure token importance, overlooking the spatial relationships between token value states in the vector space. This often leads to suboptimal token selections and thus performance degradation. To tackle this problem, we propose a novel method, namely **AnDPro** (**An**chor **D**irection **Pro**jection), which introduces a projection-based scoring function to more accurately measure token importance. Specifically, AnDPro operates in the space of value vectors and leverages the projections of these vectors onto an *``Anchor Direction''*---the direction of the pre-eviction output---to measure token importance and guide more accurate token selection. Experiments on $16$ datasets from the LongBench benchmark demonstrate that AnDPro can maintain $96.07\\%$ of the full cache accuracy using only $3.44\\%$ KV cache budget, reducing KV cache budget size by $46.0\\%$ without compromising quality compared to previous state-of-the-arts.

Subject: NeurIPS.2025 - Poster