skiadopoulos@osdi24@USENIX

Total: 1

#1 High-throughput and Flexible Host Networking for Accelerated Computing [PDF1] [Copy] [Kimi1] [REL]

Authors: Athinagoras Skiadopoulos ; Zhiqiang Xie ; Mark Zhao ; Qizhe Cai ; Saksham Agarwal ; Jacob Adelmann ; David Ahern ; Carlo Contavalli ; Michael Goldflam ; Vitaly Mayatskikh ; Raghu Raja ; Daniel Walton ; Rachit Agarwal ; Shrijeet Mukherjee ; Christos Kozyrakis

Modern network hardware is able to meet the stringent bandwidth demands of applications like GPU-accelerated AI. However, existing host network stacks offer a hard tradeoff between performance (in terms of sustained throughput when compared to network hardware capacity) and flexibility (in terms of the ability to select, customize, and extend different network protocols). This paper explores a clean-slate approach to simultaneously offer high performance and flexibility. We present a co-design of the NIC hardware and the software stack to achieve this. The key idea in our design is the physical separation of the data path (payload transfer between network and application buffers) and the control path (header processing and transport-layer decisions). The NIC enables a high-performance zero-copy data path, independent of the placement of the application (CPU, GPU, FPGA, or other accelerators). The software stack provides a flexible control path by enabling the integration of any network protocol, executing in any environment (in the kernel, in user space, or in an accelerator). We implement and evaluate ZeroNIC, a prototype that combines an FPGA-based NIC with a software stack that integrates the Linux TCP protocol. We demonstrate that ZeroNIC achieves RDMA-like throughput while maintaining the benefits of robust protocols like TCP under various network perturbations. For instance, ZeroNIC enables a single TCP flow to saturate a 100Gbps link while utilizing only 17% of a single CPU core. ZeroNIC improves NCCL and Redis throughput by 2.66X and 3.71X, respectively, over Linux TCP on a Mellanox ConnectX-6 NIC, without requiring application modifications.