有效的自适应激活舍入训练后量化

论文标题

有效的自适应激活舍入训练后量化

Efficient Adaptive Activation Rounding for Post-Training Quantization

论文作者

Li, Zhengyi, Guo, Cong, Zhu, Zhanda, Zhou, Yangjie, Qiu, Yuxian, Gao, Xiaotian, Leng, Jingwen, Guo, Minyi

论文摘要

训练后量化由于其在部署量化的神经网络方面的便利性而引起了越来越多的关注。尽管圆形到最终的量仍然是DNN量化的主要方法，但先前的研究表明，当应用于体重量化时，其次优质。他们提出通过利用输出误差而不是传统的权重量化误差来优化重量圆形方案。我们的研究表明，类似的四舍五入挑战也扩展到激活量化。尽管易于概括，但挑战在于激活的动态性质。预计会因激活而进行自适应舍入，该方法会受到运行时开销。为了解决这个问题，我们提出了一种新的视角Aquant量化框架，以通过调整激活的圆形方案来减少输出误差。我们不使用最圆形到最新操作的恒定圆形边框，而是使边界成为一个函数W.R.T.自适应边框改变激活舍入的激活值。为了处理运行时开销，我们使用边框功能的粗粒版本。最后，我们介绍我们的框架以优化边框功能。广泛的实验表明，与最先进的作品相比，Aquant取得了显着改进，并将Resnet-18的准确性提高到2位重量和激活量化下的准确性高达60.31％。

Post-training quantization attracts increasing attention due to its convenience in deploying quantized neural networks. Although rounding-to-nearest remains the prevailing method for DNN quantization, prior research has demonstrated its suboptimal nature when applied to weight quantization. They propose optimizing weight rounding schemes by leveraging output error rather than the traditional weight quantization error. Our study reveals that similar rounding challenges also extend to activation quantization. Despite the easy generalization, the challenges lie in the dynamic nature of activation. Adaptive rounding is expected for varying activations and the method is subjected to runtime overhead. To tackle this, we propose the AQuant quantization framework with a novel perspective to reduce output error by adjusting rounding schemes of activations. Instead of using the constant rounding border 0.5 of the rounding-to-nearest operation, we make the border become a function w.r.t. the activation value to change the activation rounding by the adaptive border. To deal with the runtime overhead, we use a coarse-grained version of the border function. Finally, we introduce our framework to optimize the border function. Extensive experiments show that AQuant achieves notable improvements compared to state-of-the-art works and pushes the accuracy of ResNet-18 up to 60.31% under the 2-bit weight and activation quantization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题