Zeroq：一个新颖的零射击量化框架

论文标题

Zeroq：一个新颖的零射击量化框架

ZeroQ: A Novel Zero Shot Quantization Framework

论文作者

Cai, Yaohui, Yao, Zhewei, Dong, Zhen, Gholami, Amir, Mahoney, Michael W., Keutzer, Kurt

论文摘要

量化是减少神经网络的推理时间和记忆足迹的一种有前途的方法。但是，大多数现有的量化方法都需要访问原始培训数据集，以便在量化过程中进行重新培训。对于具有敏感或专有数据的应用程序（例如，由于隐私和安全问题），这通常是不可能的。现有的零射击量化方法使用不同的启发式方法来解决这一问题，但是它们的性能差，尤其是在量化超低精度时。在这里，我们提出了ZeroQ，这是一个新颖的零击量化框架来解决此问题。 ZeroQ启用混合精确量化，而无需访问培训或验证数据。这是通过优化蒸馏数据集来实现的，该数据集经过精心设计，可以匹配网络不同层的批处理标准化的统计数据。 ZeroQ支持均匀和混合精确量化。对于后者，我们介绍了一种新型的基于Pareto边界的方法，以自动确定所有层的混合精确位设置，而无需手动搜索。我们在各种模型上进行了广泛的测试，包括Resnet18/50/152，Mobilenetv2，Shufflenet，Squeezenext和IntectionV3以及ImakeNet上的Incephionv3，以及Microsoft Coco Coco Dataset上的Retinanet-Resnet50。特别是，与最近提出的DFQ方法相比，我们表明ZeroQ在MobilenetV2上的精度可以提高1.71 \％。重要的是，Zeroq的计算开销非常低，并且可以在不到30年代（ImageNet上RESNET50的一个时期训练时间的0.5％）完成整个量化过程。我们已经开源了ZeroQ框架\ footNote {https://github.com/amirgholami/zeroq}。

Quantization is a promising approach for reducing the inference time and memory footprint of neural networks. However, most existing quantization methods require access to the original training dataset for retraining during quantization. This is often not possible for applications with sensitive or proprietary data, e.g., due to privacy and security concerns. Existing zero-shot quantization methods use different heuristics to address this, but they result in poor performance, especially when quantizing to ultra-low precision. Here, we propose ZeroQ , a novel zero-shot quantization framework to address this. ZeroQ enables mixed-precision quantization without any access to the training or validation data. This is achieved by optimizing for a Distilled Dataset, which is engineered to match the statistics of batch normalization across different layers of the network. ZeroQ supports both uniform and mixed-precision quantization. For the latter, we introduce a novel Pareto frontier based method to automatically determine the mixed-precision bit setting for all layers, with no manual search involved. We extensively test our proposed method on a diverse set of models, including ResNet18/50/152, MobileNetV2, ShuffleNet, SqueezeNext, and InceptionV3 on ImageNet, as well as RetinaNet-ResNet50 on the Microsoft COCO dataset. In particular, we show that ZeroQ can achieve 1.71\% higher accuracy on MobileNetV2, as compared to the recently proposed DFQ method. Importantly, ZeroQ has a very low computational overhead, and it can finish the entire quantization process in less than 30s (0.5\% of one epoch training time of ResNet50 on ImageNet). We have open-sourced the ZeroQ framework\footnote{https://github.com/amirgholami/ZeroQ}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题