Timezone: »

PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization
Thijs Vogels · Sai Praneeth Karimireddy · Martin Jaggi

Thu Dec 12 05:00 PM -- 07:00 PM (PST) @ East Exhibition Hall B + C #203

We study gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization. Despite the significant attention received, current compression schemes either do not scale well, or fail to achieve the target test accuracy. We propose a low-rank gradient compressor that can i) compress gradients rapidly, ii) efficiently aggregate the compressed gradients using all-reduce, and iii) achieve test performance on par with SGD. The proposed algorithm is the only method evaluated that achieves consistent wall-clock speedups when benchmarked against regular SGD with an optimized communication backend. We demonstrate reduced training times for convolutional networks as well as LSTMs on common datasets.

Author Information

Thijs Vogels (EPFL)
Sai Praneeth Karimireddy (EPFL)

I am a second year PhD student working in convex and non-convex optimization with Prof. Martin Jaggi. My focus is on designing faster and more scalable optimization algorithms for machine learning. Some of my preliminary results and problems I am currently working on- 1. Robust accelerated algorithms - Nesterov acceleration modified to be robust to noise. 2. Faster algorithms which take second order information about the function into account. 3. A $O(1/t^2)$ rate *affine invariant* algorithm for constrained optimization. 4. Frank-Wolfe algorithm for non-smooth functions using 'noisy-smoothing'

Martin Jaggi (EPFL)

More from the Same Authors