Deebert：动态早期退出加速BERT推断

论文标题

Deebert：动态早期退出加速BERT推断

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

论文作者

Xin, Ji, Tang, Raphael, Lee, Jaejun, Yu, Yaoliang, Lin, Jimmy

论文摘要

大规模的预训练的语言模型（例如BERT）为NLP应用程序带来了重大改进。但是，它们也因推理缓慢而臭名昭著，这使得它们很难在实时应用程序中部署。我们提出了一种简单但有效的方法，即加速BERT推断。我们的方法允许样品更早退出，而无需通过整个模型。实验表明，Deebert能够节省高达约40％的推理时间，而模型质量的降解最小。进一步的分析显示了BERT变压器层中不同的行为，并且还揭示了它们的冗余。我们的工作提供了新的想法，可以有效地将基于变压器的模型应用于下游任务。代码可在https://github.com/castorini/deebert上找到。

Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications. However, they are also notorious for being slow in inference, which makes them difficult to deploy in real-time applications. We propose a simple but effective method, DeeBERT, to accelerate BERT inference. Our approach allows samples to exit earlier without passing through the entire model. Experiments show that DeeBERT is able to save up to ~40% inference time with minimal degradation in model quality. Further analyses show different behaviors in the BERT transformer layers and also reveal their redundancy. Our work provides new ideas to efficiently apply deep transformer-based models to downstream tasks. Code is available at https://github.com/castorini/DeeBERT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题