论文标题
Seqpoint:识别基于序列的神经网络的代表性迭代
SeqPoint: Identifying Representative Iterations of Sequence-based Neural Networks
论文作者
论文摘要
深神经网络(DNN)的无处不在,使其成为硬件优化的关键应用程序类。但是,DNN培训的详细分析和表征仍然很困难,因为这些应用程序通常在实际硬件上运行数小时到几天。先前的工作利用了DNN的迭代性质来介绍一些训练迭代。尽管这种策略对于诸如卷积神经网络(CNN)之类的网络是合理的,在该网络中,计算的性质在很大程度上是独立的,但我们在这项工作中观察到,这种方法对于基于序列的神经网络(SQNN)(例如,重新出现的神经网络(RNNS))是亚最佳选择。 SQNN中计算的数量和性质可能会因每个输入而变化,从而导致遍历迭代的异质性。因此,任意选择一些迭代不足以准确总结整个训练的行为。为了应对这一挑战,我们仔细研究了影响SQNN训练迭代的因素,并将输入序列长度确定为跨迭代变化的关键决定因素。然后,我们使用此观察值来表征SQNN训练运行的所有迭代(不需要应用程序的分析或模拟),并选择代表性迭代,我们将其称为SEQPOINTS。我们分析了两个最先进的SQNNS,DeepSpeech2和Google的神经机器翻译(GNMT),并表明Seqpoints可以准确地表示他们的整个培训,从而在预测整体运行时和0.13%和0.13%和1.50%的速度变化时,分别仅为0.11%和0.53%的Geomean错误,分别为0.11%和0.53%。与完整的训练运行相比,这两个网络的分析所需的时间减少了345倍和214倍的分析所需的时间。结果,Seqpoint可以在短短几分钟而不是数小时或几天内对SQNN训练进行分析。
The ubiquity of deep neural networks (DNNs) continues to rise, making them a crucial application class for hardware optimizations. However, detailed profiling and characterization of DNN training remains difficult as these applications often run for hours to days on real hardware. Prior works exploit the iterative nature of DNNs to profile a few training iterations. While such a strategy is sound for networks like convolutional neural networks (CNNs), where the nature of the computation is largely input independent, we observe in this work that this approach is sub-optimal for sequence-based neural networks (SQNNs) such as recurrent neural networks (RNNs). The amount and nature of computations in SQNNs can vary for each input, resulting in heterogeneity across iterations. Thus, arbitrarily selecting a few iterations is insufficient to accurately summarize the behavior of the entire training run. To tackle this challenge, we carefully study the factors that impact SQNN training iterations and identify input sequence length as the key determining factor for variations across iterations. We then use this observation to characterize all iterations of an SQNN training run (requiring no profiling or simulation of the application) and select representative iterations, which we term SeqPoints. We analyze two state-of-the-art SQNNs, DeepSpeech2 and Google's Neural Machine Translation (GNMT), and show that SeqPoints can represent their entire training runs accurately, resulting in geomean errors of only 0.11% and 0.53%, respectively, when projecting overall runtime and 0.13% and 1.50% when projecting speedups due to architectural changes. This high accuracy is achieved while reducing the time needed for profiling by 345x and 214x for the two networks compared to full training runs. As a result, SeqPoint can enable analysis of SQNN training runs in mere minutes instead of hours or days.