Timezone: »
Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources. Neural network quantization has significant benefits in reducing the amount of intermediate results, but it often requires the full datasets and time-consuming fine tuning to recover the accuracy lost after quantization. This paper introduces the first practical 4-bit post training quantization approach: it does not involve training the quantized model (fine-tuning), nor it requires the availability of the full dataset. We target the quantization of both activations and weights and suggest three complementary methods for minimizing quantization error at the tensor level, two of whom obtain a closed-form analytical solution. Combining these methods, our approach achieves accuracy that is just a few percents less the state-of-the-art baseline across a wide range of convolutional models. The source code to replicate all experiments is available on GitHub: \url{https://github.com/submission2019/cnn-quantization}.
Author Information
Ron Banner (Intel - Artificial Intelligence Products Group (AIPG))
Yury Nahshan (Intel - Artificial Intelligence Products Group (AIPG))
Daniel Soudry (Technion)
I am an assistant professor in the Department of Electrical Engineering at the Technion, working in the areas of Machine learning and theoretical neuroscience. I am especially interested in all aspects of neural networks and deep learning. I did my post-doc (as a Gruss Lipper fellow) working with Prof. Liam Paninski in the Department of Statistics, the Center for Theoretical Neuroscience the Grossman Center for Statistics of the Mind, the Kavli Institute for Brain Science, and the NeuroTechnology Center at Columbia University. I did my Ph.D. (2008-2013, direct track) in the Network Biology Research Laboratory in the Department of Electrical Engineering at the Technion, Israel Institute of technology, under the guidance of Prof. Ron Meir. In 2008 I graduated summa cum laude with a B.Sc. in Electrical Engineering and a B.Sc. in Physics, after studying in the Technion since 2004.
More from the Same Authors
-
2020 Poster: Robust Quantization: One Model to Rule Them All »
moran shkolnik · Brian Chmiel · Ron Banner · Gil Shomron · Yury Nahshan · Alex Bronstein · Uri Weiser -
2020 Poster: Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy »
Edward Moroshko · Blake Woodworth · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2020 Spotlight: Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy »
Edward Moroshko · Blake Woodworth · Suriya Gunasekar · Jason Lee · Nati Srebro · Daniel Soudry -
2019 Poster: A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off »
Yaniv Blumenfeld · Dar Gilboa · Daniel Soudry -
2018 Poster: Norm matters: efficient and accurate normalization schemes in deep networks »
Elad Hoffer · Ron Banner · Itay Golan · Daniel Soudry -
2018 Spotlight: Norm matters: efficient and accurate normalization schemes in deep networks »
Elad Hoffer · Ron Banner · Itay Golan · Daniel Soudry -
2018 Poster: Implicit Bias of Gradient Descent on Linear Convolutional Networks »
Suriya Gunasekar · Jason Lee · Daniel Soudry · Nati Srebro -
2018 Poster: Scalable methods for 8-bit training of neural networks »
Ron Banner · Itay Hubara · Elad Hoffer · Daniel Soudry -
2017 Poster: Train longer, generalize better: closing the generalization gap in large batch training of neural networks »
Elad Hoffer · Itay Hubara · Daniel Soudry -
2017 Oral: Train longer, generalize better: closing the generalization gap in large batch training of neural networks »
Elad Hoffer · Itay Hubara · Daniel Soudry -
2016 Poster: Binarized Neural Networks »
Itay Hubara · Matthieu Courbariaux · Daniel Soudry · Ran El-Yaniv · Yoshua Bengio -
2014 Poster: Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights »
Daniel Soudry · Itay Hubara · Ron Meir