A Unified Fast Gradient Clipping Framework for DP-SGD

Weiwei Kong · Andres Munoz Medina

Great Hall & Hall B1+B2 (level 1) #1022
[ ]
Thu 14 Dec 8:45 a.m. PST — 10:45 a.m. PST


A well-known numerical bottleneck in the differentially-private stochastic gradient descent (DP-SGD) algorithm is the computation of the gradient norm for each example in a large input batch. When the loss function in DP-SGD is consists of an intermediate linear operation, existing methods in the literature have proposed decompositions of gradients that are amenable to fast norm computations. In this paper, we present a framework that generalizes the above approach to arbitrary (possibly nonlinear) intermediate operations. Moreover, we show that for certain operations, such as fully-connected and embedding layer computations, further improvements to the runtime and storage costs of existing decompositions can be deduced using certain components of our framework. Finally, preliminary numerical experiments are given to demonstrate the substantial effects of the aforementioned improvements.

Chat is not available.