培训语言模型具有增强记忆力

论文标题

培训语言模型具有增强记忆力

Training Language Models with Memory Augmentation

论文作者

Zhong, Zexuan, Lei, Tao, Chen, Danqi

论文摘要

最近的工作通过为他们提供非参数存储器组件来显着改善了语言模型（LMS）。但是，大多数现有方法仅在测试时间引入模因或使用单独训练的编码器表示它们，从而对语言模型进行了次优培训。在这项工作中，我们介绍了Trime，这是一种新颖而简单的训练方法，旨在训练LMS，并进行记忆增强。我们的方法使用一个训练目标，该目标直接以截面示例为可访问的内存。我们还提出了用于内存构建和数据批处理的新方法，这些方法用于适应不同的记忆集（在测试时间）。我们在多种语言建模和机器翻译基准上评估了Trime，并表明它能够在所有设置上实现重大改进。具体而言，Trime通过有效利用训练语料库的大型内存，将Wikitext-103上的困惑从18.70降低到15.37。与标准LM训练相比，Trime增加了可忽略的计算开销，并且与不同的神经体系结构兼容，这使其成为训练记忆启动LMS的多功能解决方案。

Recent work has improved language models (LMs) remarkably by equipping them with a non-parametric memory component. However, most existing approaches only introduce mem-ories at testing time or represent them using a separately trained encoder, resulting in suboptimal training of the language model. In this work, we present TRIME, a novel yet simple training approach designed for training LMs with memory augmentation. Our approach uses a training objective that directly takes in-batch examples as accessible memory. We also present new methods for memory construction and data batching, which are used for adapting to different sets of memories--local, long-term, and external memory--at testing time. We evaluate TRIME on multiple language modeling and machine translation benchmarks and show that it is able to achieve significant improvements across all the settings. Concretely, TRIME reduces the perplexity from 18.70 to 15.37 on WIKITEXT-103, by effectively leveraging a large memory set from the training corpus. Compared to standard LM training, TRIME adds negligible computational overhead and is compatible with different neural architectures, making it a versatile solution for training memory-augmented LMs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题