Pooling Engram Conditional Memory in Large Language Models using CXL

#1 Pooling Engram Conditional Memory in Large Language Models using CXL [PDF⁴] [Copy] [Kimi¹] [REL]

Authors: Ruiyang Ma, Teng Ma, Zhiyuan Su, Hantian Zha, Xinpeng Zhao, Xuchun Shang, Xingrui Yi, Zheng Liu, Zhu Cao, An Wu, Zhichong Dou, Ziqian Liu, Daikang Kuang, Guojie Luo

Engram conditional memory has emerged as a promising component for LLMs by decoupling static knowledge lookup from dynamic computation. Since Engram exhibits sparse access patterns and supports prefetching, its massive embedding tables are well-suited for offloading to lower-tier memory. In this paper, we propose using Compute Express Link (CXL) memory pool for Engram storage. Compared to RDMA, CXL provides fine-grained and low-latency access required by minimal and discrete retrieval patterns of Engram. We integrate the CXL-based Engram pool into SGLang, achieving near-DRAM end-to-end performance. This provides a scalable and cost-efficient storage solution for future Engram-integrated LLMs without compromising inference performance.

Subjects: Hardware Architecture , Distributed, Parallel, and Cluster Computing

Publish: 2026-03-10 14:13:02 UTC

2603.10087

#1 Pooling Engram Conditional Memory in Large Language Models using CXL [PDF4] [Copy] [Kimi1] [REL]

#1 Pooling Engram Conditional Memory in Large Language Models using CXL [PDF⁴] [Copy] [Kimi¹] [REL]