Low-precision CNN Model Quantization based on Optimal Scaling Factor Estimation

Abstract

With the development of convolutional neural networks (CNNs), researchers have acquired satisfactory performances on computer vision tasks such as image classification and semantic segmentation. However, CNNs contain millons of weight parameters and demand huge storage and computation resources, making it hard to deploy on constrained hardwares. To compress the model size and accelerate the deployment speed, many model compression methods have been proposed, including quantization methods that aim to reduce network redundancy by decreasing the representational bits for weights (and activations). However, for low-bit quantization, the inevitable quantization error will lead to significant accuracy degradation and gradient mismatch. In this paper, we propose Scale Estimation Quantization (SEQ). To reduce the quantization error, we analyze the variance of error derived by the quantization process. By exploiting the distributions of network values, we reduce quantization error and estimate the optimal scale parameters for our proposed quantization function. Further more, to deal with gradient mismatch problem in backward propagation, we propose backward approximation. We apply our algorithm on image classification tasks. Our method achieves a close performance to full-precision counterparts on VGG-Small and AlexNet with 1-bit weights and 2-bit activations.

Publication
2019 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)
Li Song
Li Song
Professor, IEEE Senior Member

Related