Total: 1
Graph Neural Network (GNN) on streaming graphs has gained increasing popularity. However, its practical deployment remains challenging, as the inference process relies on Runtime Embedding Computation (RTEC) to capture recent graph changes. This process incurs heavyweight multi-hop graph traversal overhead, which significantly undermines computation efficiency. We observe that the intermediate results for large portions of the graph remain unchanged during graph evolution, and thus redundant computations can be effectively eliminated through carefully designed incremental methods. In this work, we propose an efficient framework for incrementalizing RTEC on streaming graphs.The key idea is to decouple GNN computation into a set of generalized, fine-grained operators and safely reorder them, transforming the expensive full-neighbor GNN computation into a more efficient form over the affected subgraph. With this design, our framework preserves the semantics and accuracy of the original full-neighbor computation while supporting a wide range of GNN models with complex message-passing patterns. To further scale to graphs with massive historical results, we develop a GPU-CPU co-processing system that offloads embeddings to CPU memory with communication-optimized scheduling. Experiments across diverse graph sizes and GNN models show that our method reduces computation by 64%-99% and achieves 1.7x-145.8x speedups over existing solutions.