zhu@osdi16@USENIX

Total: 1

#1 Gemini: A Computation-Centric Distributed Graph Processing System [PDF] [Copy] [Kimi] [REL]

Authors: Xiaowei Zhu ; Wenguang Chen ; Weimin Zheng ; Xiaosong Ma

Traditionally distributed graph processing systems have largely focused on scalability through the optimizations of inter-node communication and load balance. However, they often deliver unsatisfactory overall processing efficiency compared with shared-memory graph computing frameworks. We analyze the behavior of several graph-parallel systems and find that the added overhead for achieving scalability becomes a major limiting factor for efficiency, especially with modern multi-core processors and high-speed interconnection networks. Based on our observations, we present Gemini, a distributed graph processing system that applies multiple optimizations targeting computation performance to build scalability on top of efficiency. Gemini adopts (1) a sparse-dense signal-slot abstraction to extend the hybrid push-pull computation model from shared-memory to distributed scenarios, (2) a chunk-based partitioning scheme enabling low-overhead scaling out designs and locality-preserving vertex accesses, (3) a dual representation scheme to compress accesses to vertex indices, (4) NUMA-aware sub-partitioning for efficient intra-node memory accesses, plus (5) locality-aware chunking and fine-grained work-stealing for improving both inter-node and intra-node load balance, respectively. Our evaluation on an 8-node high-performance cluster (using five widely used graph applications and five real-world graphs) shows that Gemini significantly outperforms all well-known existing distributed graph processing systems, delivering up to 39.8x (from 8.91x) improvement over the fastest among them.