NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Abstract
The performance of neural networks improves when more parameters are used. However, the model sizes are constrained by the available on-device memory during training and inference. Although applying techniques like quantization can alleviate the constraint, they suffer from performance loss while using less memory. In this work, we introduce NeuZip, a new compression scheme for neural network weights that exploits the entropy structures of different components of floating point numbers appearing in typical neural net weights. With NeuZip, we are able to achieve memory-efficient training without any performance loss. In addition, our method reduces the memory requirements during both the forward and backward passes, applicable to model training and inference. Our empirical tests across various models and datasets demonstrate the effectiveness of NeuZip in memory reduction without sacrificing the model's ability.