LeapGNN: Accelerating Distributed GNN Training Leveraging Feature-Centric Model Migration

#1 LeapGNN: Accelerating Distributed GNN Training Leveraging Feature-Centric Model Migration [PDF¹] [Copy] [Kimi] [REL]

Authors: Weijian Chen, Shuibing He, Haoyang Qu, Xuechen Zhang

Distributed training of graph neural networks (GNNs) has become a crucial technique for processing large graphs. Prevalent GNN frameworks are model-centric, necessitating the transfer of massive graph vertex features to GNN models, which leads to a significant communication bottleneck. Recognizing that the model size is often significantly smaller than the feature size, we propose LeapGNN, a feature-centric framework that reverses this paradigm by bringing GNN models to vertex features. To make it truly effective, we first propose a micrograph-based training strategy that leverages a refined structure to enhance locality, combined with the model migration technique, to minimize remote feature retrieval. Then, we devise a feature pre-gathering approach that merges multiple fetch operations into a single one to eliminate redundant feature transmissions. Finally, we employ a micrograph-based merging method that adjusts the number of micrographs for each worker to minimize kernel switches and synchronization overhead. Our experimental results demonstrate that LeapGNN achieves a performance speedup of up to 4.2× compared to the state-of-the-art method, namely P3.

chen-weijian-leap@fast25@USENIX

#1 LeapGNN: Accelerating Distributed GNN Training Leveraging Feature-Centric Model Migration [PDF1] [Copy] [Kimi] [REL]

#1 LeapGNN: Accelerating Distributed GNN Training Leveraging Feature-Centric Model Migration [PDF¹] [Copy] [Kimi] [REL]