KFOpt: Noise Reduction with Kalman Filter for Improving Differentially Private Optimization
Abstract
Differentially private (DP) optimizers have been widely used to train modern machine learning models while protecting the privacy of training data. A popular approach to privatize an optimizer is to clip the individual gradients and add sufficiently large noise to the clipped gradient. However, a significant performance drop is observed when these optimizers are applied to large-scale model (pre-)training. This degradation stems from the substantial noise injection required to maintain DP, which disrupts the optimizer's dynamics. This paper introduces KFOpt, a novel framework designed to significantly enhance the performance of DP optimizers. KFOpt employs Kalman filtering, a technique drawn from control and signal processing, to effectively denoise privatized gradients and generate progressively refined gradient estimations. To ensure practicality for large-scale training, we simplify the Kalman filtering process, minimizing its memory and computational demands. We establish theoretical privacy-utility trade-off guarantees for KFOpt, and demonstrate provable improvements over standard DP optimizers like DPSGD. Extensive experiments across diverse tasks, including vision tasks such as CIFAR-100 and ImageNet-1k and language fine-tuning tasks such as GLUE, E2E, and DART, validate the effectiveness of KFOpt. The results showcase its ability to significantly improve the performance of DP optimizers, surpassing state-of-the-art results under the same privacy constraints.