Beyond Data and Model Parallelism for Deep Neural Networks. | Cool Papers

#1 Beyond Data and Model Parallelism for Deep Neural Networks. [PDF¹] [Copy] [Kimi] [REL]

Authors: Zhihao Jia, Matei Zaharia, Alex Aiken

Existing deep learning systems commonly parallelize deep neural network (DNN) training using data or model parallelism, but these strategies often result in suboptimal parallelization performance. We introduce SOAP, a more comprehensive search space of parallelization strategies for DNNs that includes strategies to parallelize a DNN in the Sample, Operator, Attribute, and Parameter dimensions. We present FlexFlow, a deep learning engine that uses guided randomized search of the SOAP space to find a fast parallelization strategy for a specific parallel machine. To accelerate this search, FlexFlow introduces a novel execution simulator that can accurately predict a parallelization strategy’s performance and is three orders of magnitude faster than prior approaches that execute each strategy. We evaluate FlexFlow with six real-world DNN benchmarks on two GPU clusters and show that FlexFlow increases training throughput by up to 3.3× over state-of-the-art approaches, even when including its search time, and also improves scalability.

Subject: MLSYS.2019

b422680f3db0986ddd7f8f126baaf0fa@2019@MLSYS

#1 Beyond Data and Model Parallelism for Deep Neural Networks. [PDF1] [Copy] [Kimi] [REL]

#1 Beyond Data and Model Parallelism for Deep Neural Networks. [PDF¹] [Copy] [Kimi] [REL]