具有两个量化的神经网络的节能硬件加速度

论文标题

具有两个量化的神经网络的节能硬件加速度

Energy Efficient Hardware Acceleration of Neural Networks with Power-of-Two Quantisation

论文作者

Przewlocka-Rus, Dominika, Kryjak, Tomasz

论文摘要

深度神经网络实际上在大多数现代视觉系统的领域中占主导地位，以增加计算复杂性的成本提供高性能。对于那些系统通常需要实时和最少的能源消耗，例如，可穿戴设备或自动驾驶汽车或自动驾驶汽车（iot网络）（IOT网络），各种网络的优化技术，E.G.体系结构。由于神经网络层中的权重的对数分布，这种方法可提供高性能，以显着降低计算精度（对于4位权重，更少）的方法是两个（POT）定量（因此也具有对数分布）。该方法引入了更换神经网络典型的其他可能性，并积累（MAC-执行，例如卷积操作）单位，具有更节能的bitshift和累积（BAC）。在本文中，我们表明，在Zynq Ultrascale + MPSOC ZCU104 SOC FPGA上实现的锅重量的硬件神经网络加速器比统一量化版本至少要高1.4倍$ 1.4倍。为了通过省略零权重的一部分计算来进一步减少实际功率要求，我们还提出了一种适合对数定量的新修剪方法。

Deep neural networks virtually dominate the domain of most modern vision systems, providing high performance at a cost of increased computational complexity.Since for those systems it is often required to operate both in real-time and with minimal energy consumption (e.g., for wearable devices or autonomous vehicles, edge Internet of Things (IoT), sensor networks), various network optimisation techniques are used, e.g., quantisation, pruning, or dedicated lightweight architectures. Due to the logarithmic distribution of weights in neural network layers, a method providing high performance with significant reduction in computational precision (for 4-bit weights and less) is the Power-of-Two (PoT) quantisation (and therefore also with a logarithmic distribution). This method introduces additional possibilities of replacing the typical for neural networks Multiply and ACcumulate (MAC -- performing, e.g., convolution operations) units, with more energy-efficient Bitshift and ACcumulate (BAC). In this paper, we show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 SoC FPGA can be at least $1.4x$ more energy efficient than the uniform quantisation version. To further reduce the actual power requirement by omitting part of the computation for zero weights, we also propose a new pruning method adapted to logarithmic quantisation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题