Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers

#1 Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers [PDF] [Copy] [Kimi] [REL]

Authors: Chen Zhuang, Lingqi Zhang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib

Graph Convolutional Networks (GCNs), particularly for large-scale graphs, are crucial across numerous domains. However, training distributed full-batch GCNs on large-scale graphs suffers from inefficient memory access patterns and high communication overhead. To address these challenges, we introduce \method{}, an efficient and scalable distributed GCN training framework tailored for CPU-powered supercomputers. Our contributions are threefold: (1) we develop general and efficient aggregation operators designed for irregular memory access, (2) we propose a hierarchical aggregation scheme that reduces communication costs without altering the graph structure, and (3) we present a communication-aware quantization scheme to enhance performance. Experimental results demonstrate that \method{} achieves a speedup of up to 6$\times$ compared with the SoTA implementations, and scales to 1000s of HPC-grade CPUs on the largest publicly available datasets, without sacrificing model convergence and accuracy. Moreover, due to the effective strong scaling of \method{}, we outperform SoTA GPU-based and CPU-based distributed full-batch GCN training frameworks, in absolute performance, for large-scale graphs.

Subjects: Distributed, Parallel, and Cluster Computing , Performance

Publish: 2024-11-25 00:52:18 UTC

2411.16025

#1 Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers [PDF] [Copy] [Kimi] [REL]