48fecef47b19fe501d27d338b6d52582@2024@MLSYS

Total: 1

#1 Keyformer: KV Cache reduction through key tokens selection for Efficient Generative Inference [PDF¹⁹] [Copy] [Kimi¹⁷] [REL]

Authors: Muhammad Adnan, Akhil Arunkumar, Gaurav Jain, Prashant Nair, Ilya Soloveychik, Purushotham Kamath

No summary was provided.

Subject: MLSYS.2024