饮食：对话系统的轻量级语言理解

论文标题

饮食：对话系统的轻量级语言理解

DIET: Lightweight Language Understanding for Dialogue Systems

论文作者

Bunk, Tanja, Varshneya, Daksh, Vlasov, Vladimir, Nichol, Alan

论文摘要

大规模的预训练的语言模型在理解胶水和超级亮度等基准的语言上显示出令人印象深刻的结果，与其他预训练方法（如分布式表示（手套）和纯监督的方法）相比，大大改善了。我们介绍了双重意图和实体变压器（饮食）体系结构，并研究不同预培训的表示对意图和实体预测的有效性，这是两个共同的对话语言理解任务。饮食在复杂的多域NLU数据集上提高了最新技术的状态，并在其他更简单的数据集上获得了类似的高性能。令人惊讶的是，我们表明，在此任务中使用大型预培训模型没有明显的好处，实际上，即使在没有任何预先训练的嵌入的情况下，即使在纯粹有监督的设置中，饮食也会改善当前最新技术的状态。我们表现最好的模型优于微调伯特，训练速度快六倍。

Large-scale pre-trained language models have shown impressive results on language understanding benchmarks like GLUE and SuperGLUE, improving considerably over other pre-training methods like distributed representations (GloVe) and purely supervised approaches. We introduce the Dual Intent and Entity Transformer (DIET) architecture, and study the effectiveness of different pre-trained representations on intent and entity prediction, two common dialogue language understanding tasks. DIET advances the state of the art on a complex multi-domain NLU dataset and achieves similarly high performance on other simpler datasets. Surprisingly, we show that there is no clear benefit to using large pre-trained models for this task, and in fact DIET improves upon the current state of the art even in a purely supervised setup without any pre-trained embeddings. Our best performing model outperforms fine-tuning BERT and is about six times faster to train.

下载PDF全文

下载文献需遵守相关版权规定

论文标题