48fecef47b19fe501d27d338b6d52582@2024@MLSYS

Total: 1

#1 Keyformer: KV Cache reduction through key tokens selection for Efficient Generative Inference [PDF12] [Copy] [Kimi11] [REL]

Authors: Muhammad Adnan, Akhil Arunkumar, Gaurav Jain, Prashant Nair, Ilya Soloveychik, Purushotham Kamath

No summary was provided.