Timezone: »

More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server
Qirong Ho · James Cipar · Henggang Cui · Seunghak Lee · Jin Kyu Kim · Phillip B. Gibbons · Garth Gibson · Greg Ganger · Eric Xing

Sat Dec 07 07:00 PM -- 11:59 PM (PST) @ Harrah's Special Events Center, 2nd Floor

We propose a parameter server system for distributed ML, which follows a Stale Synchronous Parallel (SSP) model of computation that maximizes the time computational workers spend doing useful work on ML algorithms, while still providing correctness guarantees. The parameter server provides an easy-to-use shared interface for read/write access to an ML model's values (parameters and variables), and the SSP model allows distributed workers to read older, stale versions of these values from a local cache, instead of waiting to get them from a central storage. This significantly increases the proportion of time workers spend computing, as opposed to waiting. Furthermore, the SSP model ensures ML algorithm correctness by limiting the maximum age of the stale values. We provide a proof of correctness under SSP, as well as empirical results demonstrating that the SSP model achieves faster algorithm convergence on several different ML problems, compared to fully-synchronous and asynchronous schemes.

Author Information

Qirong Ho (Petuum, Inc.)
James Cipar (CMU)
Henggang Cui (CMU)
Seunghak Lee (Carnegie Mellon University)
Jin Kyu Kim (Carnegie Mellon University)
Phillip B. Gibbons (Intel Labs)
Garth Gibson (Vector Institute and CMU)
Greg Ganger (CMU)
Eric Xing (Petuum Inc. / Carnegie Mellon University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors