变压器语法：以句法归纳偏见为大规模增强变压器语言模型

论文标题

变压器语法：以句法归纳偏见为大规模增强变压器语言模型

Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

论文作者

Sartran, Laurent, Barrett, Samuel, Kuncoro, Adhiguna, Stanojević, Miloš, Blunsom, Phil, Dyer, Chris

论文摘要

我们介绍了变压器语法（TGS），这是一种新颖的变压器语言模型，结合了（i）变压器的表达能力，可伸缩性和强大性能以及（ii）递归句法组成，在这里通过特殊的注意力掩膜和线性化树的确定性变化实现。我们发现，TG在句子级别的语言建模困惑以及多种语法敏感的语言建模评估指标上的表现优于各种强大的基准。此外，我们发现代表每个句子的递归句法组成瓶颈会损害文档级语言建模的困惑，提供证据表明，另一种类型的记忆机制（一种独立于组成的句法表示形式）在当前的长期成功模型中起着重要的作用。

We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic transformation of the linearized tree. We find that TGs outperform various strong baselines on sentence-level language modeling perplexity, as well as on multiple syntax-sensitive language modeling evaluation metrics. Additionally, we find that the recursive syntactic composition bottleneck which represents each sentence as a single vector harms perplexity on document-level language modeling, providing evidence that a different kind of memory mechanism -- one that is independent of composed syntactic representations -- plays an important role in current successful models of long text.

下载PDF全文

下载文献需遵守相关版权规定

论文标题