Timezone: »

CSER: Communication-efficient SGD with Error Reset
Cong Xie · Shuai Zheng · Sanmi Koyejo · Indranil Gupta · Mu Li · Haibin Lin

Wed Dec 09 09:00 AM -- 11:00 AM (PST) @ Poster Session 3 #1130
The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks. We propose a novel SGD variant: \underline{C}ommunication-efficient \underline{S}GD with \underline{E}rror \underline{R}eset, or \underline{CSER}. The key idea in CSER is first a new technique called ``error reset'' that adapts arbitrary compressors for SGD, producing bifurcated local models with periodic reset of resulting local residual errors. Second we introduce partial synchronization for both the gradients and the models, leveraging advantages from them. We prove the convergence of CSER for smooth non-convex problems. Empirical results show that when combined with highly aggressive compressors, the CSER algorithms accelerate the distributed training by nearly $10\times$ for CIFAR-100, and by $4.5\times$ for ImageNet.

Author Information

Cong Xie (University of Illinois Urbana-Champaign)
Shuai Zheng (Amazon Web Services)
Sanmi Koyejo (Illinois / Google)
Sanmi Koyejo

Sanmi Koyejo an Assistant Professor in the Department of Computer Science at Stanford University. Koyejo also spends time at Google as a part of the Brain team. Koyejo's research interests are in developing the principles and practice of trustworthy machine learning. Additionally, Koyejo focuses on applications to neuroscience and healthcare. Koyejo has been the recipient of several awards, including a best paper award from the conference on uncertainty in artificial intelligence (UAI), a Skip Ellis Early Career Award, and a Sloan Fellowship. Koyejo serves as the president of the Black in AI organization.

Indranil Gupta (UIUC)
Mu Li (Amazon)
Haibin Lin (Amazon Web Services)

More from the Same Authors