标记评估：使用人口估计方法评估语言的产生

论文标题

标记评估：使用人口估计方法评估语言的产生

Mark-Evaluate: Assessing Language Generation using Population Estimation Methods

论文作者

Mordido, Gonçalo, Meinel, Christoph

论文摘要

我们建议一个指标家族来评估从生态学广泛使用的人群估计方法得出的语言产生。更具体地说，我们使用了在过去几十年中应用的标记重新接收和最大样本方法，以估计野生封闭种群的大小。我们提出了三个新颖的指标：me $ _ \ text {petersen} $和我$ _ \ text {capture} $，它检索了单值评估，而我$ _ \ text {schnabel} $返回了双重价值的度量，以分别评估质量和多样性，以评估评估设置。在合成实验中，我们的方法家族对质量和多样性的下降敏感。此外，我们的方法比现有的几个具有挑战性的任务，即无条件的语言生成，机器翻译和文本摘要，与现有指标相比，与人类评估的相关性更高。

We propose a family of metrics to assess language generation derived from population estimation methods widely used in ecology. More specifically, we use mark-recapture and maximum-likelihood methods that have been applied over the past several decades to estimate the size of closed populations in the wild. We propose three novel metrics: ME$_\text{Petersen}$ and ME$_\text{CAPTURE}$, which retrieve a single-valued assessment, and ME$_\text{Schnabel}$ which returns a double-valued metric to assess the evaluation set in terms of quality and diversity, separately. In synthetic experiments, our family of methods is sensitive to drops in quality and diversity. Moreover, our methods show a higher correlation to human evaluation than existing metrics on several challenging tasks, namely unconditional language generation, machine translation, and text summarization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题