论文标题
感知得分,用于开放式文本生成评估的学识关指标
Perception Score, A Learned Metric for Open-ended Text Generation Evaluation
论文作者
论文摘要
开放式自然语言生成任务的自动评估仍然是一个挑战。现有的指标(例如BLEU)与人类判断力的相关性较低。我们提出了一个新颖且强大的基于学习的评估度量:感知得分。该方法可以通过整体评分来衡量生成的总体质量,而不仅仅是关注一个评估标准,例如单词重叠。此外,它还显示了其评估结果的不确定性。通过连接不确定性,感知得分可为生成系统提供更准确的评估。感知得分为两个有条件的生成任务和两个无条件生成任务提供了最先进的结果。
Automatic evaluation for open-ended natural language generation tasks remains a challenge. Existing metrics such as BLEU show a low correlation with human judgment. We propose a novel and powerful learning-based evaluation metric: Perception Score. The method measures the overall quality of the generation and scores holistically instead of only focusing on one evaluation criteria, such as word overlapping. Moreover, it also shows the amount of uncertainty about its evaluation result. By connecting the uncertainty, Perception Score gives a more accurate evaluation for the generation system. Perception Score provides state-of-the-art results on two conditional generation tasks and two unconditional generation tasks.