是什么使审慎语言模型难以进行数据之间的文本生成？

论文标题

是什么使审慎语言模型难以进行数据之间的文本生成？

What Makes Data-to-Text Generation Hard for Pretrained Language Models?

论文作者

Keymanesh, Moniba, Benton, Adrian, Dredze, Mark

论文摘要

表达对结构性事实或关系的自然语言描述（数据到文本生成（D2T））增加了结构化知识存储库的可访问性。先前的工作表明，预先训练的语言模型（PLM）在对大量特定于任务的培训数据进行微调后，在此任务上表现出色。另一方面，虽然自动回归PLM可以从一些任务示例中概括，但它们在D2T的疗效在很大程度上没有探索。此外，我们对D2T对PLM的极限有不完全理解。在这项工作中，我们对DART多域D2T数据集的微调和自动回归PLM进行了经验研究。我们将它们的性能视为特定于任务数据的数量以及如何将这些数据纳入模型的函数：零和少量学习以及模型权重的微调。此外，我们通过测量评估数据子集的性能来探测PLM的限制：新型谓词和抽象性测试示例。为了提高这些子集的性能，我们研究了两种技术：在上下文中提供谓词描述，并通过反射在源中反映的信息重新排列生成的候选者。最后，我们对模型错误进行了人体评估，并表明D2T生成任务将受益于更仔细的手动策划的数据集。

Expressing natural language descriptions of structured facts or relations -- data-to-text generation (D2T) -- increases the accessibility of structured knowledge repositories. Previous work shows that pre-trained language models(PLMs) perform remarkably well on this task after fine-tuning on a significant amount of task-specific training data. On the other hand, while auto-regressive PLMs can generalize from a few task examples, their efficacy at D2T is largely unexplored. Furthermore, we have an incomplete understanding of the limits of PLMs on D2T. In this work, we conduct an empirical study of both fine-tuned and auto-regressive PLMs on the DART multi-domain D2T dataset. We consider their performance as a function of the amount of task-specific data and how these data are incorporated into the models: zero and few-shot learning, and fine-tuning of model weights. In addition, we probe the limits of PLMs by measuring performance on subsets of the evaluation data: novel predicates and abstractive test examples. To improve the performance on these subsets, we investigate two techniques: providing predicate descriptions in the context and re-ranking generated candidates by information reflected in the source. Finally, we conduct a human evaluation of model errors and show that D2T generation tasks would benefit from datasets with more careful manual curation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题