论文标题
iOS:CNN加速度的操作员调度程序
IOS: Inter-Operator Scheduler for CNN Acceleration
论文作者
论文摘要
为了加速CNN推断,现有的深度学习框架着重于优化操作员并行化。但是,鉴于高性能硬件的快速进步,单个操作员再也无法完全利用可用的并行性,从而导致峰值性能与真实性能之间存在很大差距。在较小的批次尺寸下,这种性能差距更为严重。在这项工作中,我们广泛研究了操作员与建议操作员调度程序(IOS)之间的并行性,以通过新型的动态编程算法自动安排多个操作员的并行执行。在现代CNN基准上,iOS始终优于最先进的库(例如,Tensorrt)1.1至1.5倍。复制每个实验的代码可在以下网址提供:https://github.com/mit-han-lab/inter-operator-scheduler。
To accelerate CNN inference, existing deep learning frameworks focus on optimizing intra-operator parallelization. However, a single operator can no longer fully utilize the available parallelism given the rapid advances in high-performance hardware, resulting in a large gap between the peak performance and the real performance. This performance gap is more severe under smaller batch sizes. In this work, we extensively study the parallelism between operators and propose Inter-Operator Scheduler (IOS) to automatically schedule multiple operators' parallel execution through a novel dynamic programming algorithm. IOS consistently outperforms state-of-the-art libraries (e.g., TensorRT) by 1.1 to 1.5x on modern CNN benchmarks. The code to reproduce each experiment is available at: https://github.com/mit-han-lab/inter-operator-scheduler.