论文标题
通过事实消融探测事实基础的内容转移
Probing Factually Grounded Content Transfer with Factual Ablation
论文作者
论文摘要
尽管最近成功,大型神经模型通常会产生实际错误的文本。更复杂的是缺乏对事实性的标准自动评估 - 如果无法测量它,它就无法有意义地改进。扎根的一代有望解决这两个问题的途径:模型利用可靠的外部文档(接地)获取事实信息,简化了事实的挑战。衡量事实的衡量也是简化的 - 对事实的一致性,测试这一代是否与基础相一致,而不是所有事实。但是,如果没有标准的自动指标,则实际上是扎根的一代,仍然是一个空旷的问题。 我们研究了内容转移的问题,在该问题中,几代人使用事实基础的信息扩展了提示。特别是,该领域使我们能够引入事实消融的概念,以自动衡量事实的一致性:这捕获了直觉,即在给定较少相关的基础文档的情况下,该模型应该不太可能产生输出。在实践中,我们通过提供两个接地文档的模型来衡量这一点,并且该模型应该更喜欢使用更相关的文档。我们贡献了两个评估集来衡量这一点。应用我们的新评估,我们提出了多种新型方法,以改进强大的基准。
Despite recent success, large neural models often generate factually incorrect text. Compounding this is the lack of a standard automatic evaluation for factuality--it cannot be meaningfully improved if it cannot be measured. Grounded generation promises a path to solving both of these problems: models draw on a reliable external document (grounding) for factual information, simplifying the challenge of factuality. Measuring factuality is also simplified--to factual consistency, testing whether the generation agrees with the grounding, rather than all facts. Yet, without a standard automatic metric for factual consistency, factually grounded generation remains an open problem. We study this problem for content transfer, in which generations extend a prompt, using information from factual grounding. Particularly, this domain allows us to introduce the notion of factual ablation for automatically measuring factual consistency: this captures the intuition that the model should be less likely to produce an output given a less relevant grounding document. In practice, we measure this by presenting a model with two grounding documents, and the model should prefer to use the more factually relevant one. We contribute two evaluation sets to measure this. Applying our new evaluation, we propose multiple novel methods improving over strong baselines.