31f0ece68edf17cf695fdf4e9c5e17d1@2020@MLSYS

Total: 1

#1 BPPSA: Scaling Back-propagation by Parallel Scan Algorithm [PDF] [Copy] [Kimi] [REL]

Authors: Shang Wang ; Yifan Bai ; Gennady Pekhimenko

In an era when the performance of a single compute device plateaus, software must be designed to scale on a massively parallel system for better runtime performance. However, in the context of training deep learning models, the commonly used back-propagation (BP) algorithm imposes a strong sequential dependency in the process of gradient computation. Under model parallelism, BP has a theoretical step complexity of Theta(n) which hinders its scalability in a parallel computing environment, where n represents the number of compute devices into which a model is partitioned.