论文标题

贝叶斯位:统一量化和修剪

Bayesian Bits: Unifying Quantization and Pruning

论文作者

van Baalen, Mart, Louizos, Christos, Nagel, Markus, Amjad, Rana Ali, Wang, Ying, Blankevoort, Tijmen, Welling, Max

论文摘要

我们引入了贝叶斯位,这是一种实用方法,用于通过基于梯度的优化进行连接混合精度量化和修剪。贝叶斯位采用了量化操作的新型分解,该分子依次考虑将钻头宽度加倍。在每个新的位宽度下,整个精度值和先前圆形值之间的残差误差将被量化。然后,我们决定是否添加此量化的残差误差,以提高有效的位宽度和较低的量化噪声。通过从两个位宽度开始,该分解将始终产生硬件友好的配置,并通过附加的0位选项可作为修剪和量化的统一视图。然后,贝叶斯钻头引入了可学习的随机门,该大门共同控制给定张量的位宽度。结果,我们可以通过对门上的大致推断来获得低位解决方案,并以前的分布鼓励大多数分布。我们通过实验性地验证了几个基准数据集上提出的方法,并表明我们可以学习修剪的混合精确网络,这些网络比其静态位宽度等效物在准确性和效率之间提供更好的权衡。

We introduce Bayesian Bits, a practical method for joint mixed precision quantization and pruning through gradient based optimization. Bayesian Bits employs a novel decomposition of the quantization operation, which sequentially considers doubling the bit width. At each new bit width, the residual error between the full precision value and the previously rounded value is quantized. We then decide whether or not to add this quantized residual error for a higher effective bit width and lower quantization noise. By starting with a power-of-two bit width, this decomposition will always produce hardware-friendly configurations, and through an additional 0-bit option, serves as a unified view of pruning and quantization. Bayesian Bits then introduces learnable stochastic gates, which collectively control the bit width of the given tensor. As a result, we can obtain low bit solutions by performing approximate inference over the gates, with prior distributions that encourage most of them to be switched off. We experimentally validate our proposed method on several benchmark datasets and show that we can learn pruned, mixed precision networks that provide a better trade-off between accuracy and efficiency than their static bit width equivalents.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源