abadi@osdi16@USENIX

Total: 1

#1 TensorFlow: A System for Large-Scale Machine Learning [PDF] [Copy] [Kimi] [REL]

Authors: Martín Abadi ; Paul Barham ; Jianmin Chen ; Zhifeng Chen ; Andy Davis ; Jeffrey Dean ; Matthieu Devin ; Sanjay Ghemawat ; Geoffrey Irving ; Michael Isard ; Manjunath Kudlur ; Josh Levenberg ; Rajat Monga ; Sherry Moore ; Derek G. Murray ; Benoit Steiner ; Paul Tucker ; Vijay Vasudevan ; Pete Warden ; Martin Wicke ; Yuan Yu ; Xiaoqiang Zheng

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor- Flow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous “parameter server” designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with a focus on training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model and demonstrate the compelling performance that Tensor- Flow achieves for several real-world applications.