lo@osdi23@USENIX

Total: 1

#1 RON: One-Way Circular Shortest Routing to Achieve Efficient and Bounded-waiting Spinlocks [PDF2] [Copy] [Kimi1] [REL]

Authors: Shiwu Lo ; Han-Ting Lin ; Yao-Hung Hsieh ; Chao-Ting Lin ; Yu-Hsueh Fang ; Ching-Shen Lin ; Ching-Chun (Jim) Huang ; Kam Yiu Lam ; Yuan-Hao Chang

As the number of processor cores increases, the efficiency of accessing shared variables through the lock-unlock method decreases. A NUMA-aware algorithm, which only considers the transmission delay between processors, may not fully utilize the connection network of a multi-core processor. This limits the scalability of a multi-core processor due to the large amount of low- and variable-cost data sharing between cores. The problem is that the reduction in communication cost cannot compensate for the increase in the time complexity of the spinlocks, and the farthest transmission distance becomes longer with more cores. We propose a method called Routing on Network-on-chip (RON) to minimize the communication cost between cores by using a routing table and pre-calculating an optimized locking-unlocking order. RON delivers locks and data in a one-way circular manner among cores to (1) minimize global data movement cost and (2) achieve bounded waiting time. Microbenchmarks provide quantitative analysis, while multi-core benchmarks show performance under various workloads. In terms of user space performance, RON improves the performance of Google LevelDB by 22.1% and 24.2% compared to ShflLock and C-BO-MCS, respectively. In the kernel space, RON is 1.8 times faster than using ShflLock for Google LevelDB. RON-plock solves the problem of oversubscription with constant space complexity and achieves 3.7 times and 18.9 times better performance than ShflLock-B and C-BO-MCS-B, respectively.