Democratizing Distributed Machine Learning
Speaker
Murali Annavaram
Date and Time
Friday, November 23rd, 2018 at 14:15.
Location
Polacksbacken, ITC, room 1113.
Abstract
Machine learning (ML) is a computationally intensive task. Executing ML training on a single server, for instance, leads to unacceptable latencies. Hence, it is necessary to distribute the computational load on multiple machines. But distributing machine learning has several challenges. First, when performing distributed training the gradient vectors must be frequently exchanged across all the machines to synchronize the gradients, which leads to significant communication bottlenecks. Nvidia’s DGX machines tackle this challenge with custom interconnects across GPUs with huge costs. Second, when computations are distributed, a slow performing node (a straggler) can slow down the computation of the entire ensemble of servers. Typically wide spread replication is used to tackle this challenge which dramatically increases the cost.
In this talk I will present some of our recent research results tackling these challenges that will appear at the NIPS 2018 conference as well as some of our ongoing work. In the first part of the talk I will present a communication curtailing ML training approach called gradient vector quantization. We design a gradient compression approach where gradients can be aggregated in the compressed domain and communication can be entirely hidden within computation. In the second part of the talk I will present a coded computation framework that uses information encoding to perform computations on coded data, which in turn enables our approach to efficiently tolerate stragglers. We hope that using some of our approaches we can bring distributed machine learning within the reach of all users.
Speaker Bio
Murali Annavaram is a Professor in the Ming-Hsieh Department of Electrical Engineering at the University of Southern California. He held the the Robert G. and Mary G. Lane Early Career Chair position at USC. His research focuses on energy efficiency and reliability of high performance computing, bandwidth efficient big data computing by moving computing closer to data, runtime systems design to enable dispersed computing on highly heterogeneous edge devices, and building processor architectures using superconducting devices. On the high performance computing front, his group built the KnightShift server architecture which has been demonstrated to have near-perfect energy proportional behavior. His group also proposed innovative microarchitectural features for improving the efficiency and reliability of graphics processing units (GPUs). On the big data front, his group designed a novel flash memory storage platform that automatically filters and summarizes large data sets to reduce the bandwidth demand to CPU.